A predictive analytics platform can help healthcare providers identify which patients and team members could be at the highest risk for severe illness / hospitalization.
2. It has been nearly nine months since many of us heard
the term coronavirus for the first time. Though COVID-
19 is a new form of illness, coronavirus is not a new
virus, it was first identified back in 1964 by Dr. June
Almeida. Since the time WHO declared it as a
pandemic, the world has come to an absolute
standstill with Healthcare being highly impacted. Most
of the outpatient and elective surgeries have been
postponed and hospitals are primarily treating COVID
patients.
Due to the outbreak, there has been an exponential
growth in telehealth and teleconsultation for diagnosis
and treatment. But there are situations where patients
may need a regular doctor’s visit for treatment around
cancer, dialysis, or emergency care. They prefer a face-
to-face medical consultation but are reluctant to visit
hospitals in fear of contracting the disease.
Patients have concerns around safety, travel and zone
status of hospitals. The clinic / hospital administration
would like to implement effective plans to prevent in-
hospital infections. They would like to obtain insights
around:
▪ Which patients are at risk?
▪ Which of the team members / staff are at risk?
▪ What is the overall clinic-level risk?
A predictive analytics platform can help healthcare
providers answer these concerns and help determine
which patients and team members could be at the
highest risk for severe illness / hospitalization.
INTRODUCTION
1
3. Step 1: Identify the Source Systems
It is necessary to identify the data elements required
to calculate the risk scores.
1. Patient Risk: We primarily leverage the underlying
medical conditions, such as diabetes, hypertension,
chronic kidney disease, etc. These parameters
contribute towards calculating the patient risk.
These are customized ‘Social Determinants of
Health’ index, which should be calculated at a
patient level. The second component to be
considered is the location. Ideally, the location of
the patient as well as of the clinic will help decide
the risk zone from a COVID perspective.
o Patient SDH: A patient ‘Social Determinant of
Health’ index incorporates social determinants
of health specifically for each patient.
o Infected Population: A component tracking
the number of individuals infected at a specific
timeframe in a specific location (e.g. state /
county / zip code).
2. TeamMate Risk: Similar to patient risk, staff / team
member risk can be calculated based on the team
member’s customized ‘Social Determinant of
Health’ index and the infected population. As clinic
/ hospital staff work at multiple locations, it is
essential to consider the multiple locations visited
by a staff member.
o TeamMate SDH: ‘Social Determinant of Health’
index includes social determinants of health of
each staff member.
o Infected Population: A component tracking
the number of individuals infected at a specific
timeframe in a specific location (e.g. state /
county / zip code).
3. Clinic Risk: The clinic risk depends on the
patients, staff, and team members visiting a
clinic. It is the sum of both patient risk and
teammate risk. Data sources required to
calculate clinic risk include:
o Daily COVID cases by state, county, zip code
o Patient specific health conditions
o Team member specific health conditions
2
STEPS TO IDENTIFY HIGH RISK PATIENTS
4. 3
Step 2: Extract. Transform. Load.
After identifying the necessary data sources, the data
needs to be standardized for the target data model.
For instance, a blood sugar test has alternative names
such as:
▪ Blood Sugar Fasting
▪ PP
▪ Sugar Test
▪ Fasting
▪ Post Lunch
▪ Glucose Test
Standardization of these different terminologies is
necessary for data analysis. The standardized data is
then loaded in the target data model / data
warehouse.
Step 3: The Modeling Layer
An important step in identifying high-risk patients is to
determine the appropriate Machine Learning models.
For instance, to forecast future Covid-19 cases for a
specific location (state / county / zip code), three
Machine Learning models could be leveraged.
1. SIR - Epidemiological model: This model, also
known as a compartmental model, is leveraged
during epidemics / pandemics. The model can
calculate the possible number of infected people
over time in a closed population. It consists of
three compartments that need to be considered (at
a specific point in time) while calculating the Rate
of Infection R0 (R naught).
o S – The number of susceptible individuals
o I – The number of infected individuals
o R – The number of recovered individuals
The model is governed by the following equations:
5. 4
S(t) is the average number of susceptibles at time t;
I(t)is the number of infectious at time t; N is the
size of the population which decreases due to
disease-induced deaths; β is the contact rate, and
γ^(-1) is the infectious period.
2. ARIMA Time Series Forecasting: Auto Regressive
Integrated Moving Average (ARIMA) is a time
series forecasting model. The first step in this
forecasting process is to determine if the time
series is stationary. For a stationary time series, the
mean and variance should be independent over
time. CitiusTech performed a ADF (Augmented
Dicky Fuller) test on Covid-19 case data and found
it to be non-stationary. The data is converted into a
stationary format using log transformation method.
Auto ARIMA was implemented to calculate:
o Number of time lags (p): Order of the
autoregressive model
o Degree of differencing (d): The number of
times the data had past values subtracted
o Order of the moving-average model (q)
3. LSTM – Long Short-term Memory Networks:
LSTM is a recurrent neural network that is
sensitive to the scale of input data, especially
when the sigmoid (default) or tanh activation
functions are used. The network has a visible
layer with 1 input, a hidden layer with 4 LSTM
blocks or neurons, and an output layer that
makes a single value prediction.
The default sigmoid activation function is used
for the LSTM blocks. The network is trained for
100 epochs and a batch size of 1 is used. Once
the model is fit, it is easier to determine the
performance of the model on the dataset. It is
crucial to invert the predictions before
calculating error scores to ensure that the
performance is reported in the same units as
the original data.
Finally, the risk scores are calculated based on
the forecasted values from the three Machine
Learning models.
6. 5
Step 4: The Reporting Layer
The predictive analytics platform can be leveraged by
the following key user groups:
▪ Medical Director
▪ Regional Head
▪ Clinic Administrative Head
This would allow each user group to:
▪ Track and monitor risk scores over time
▪ Define steps / measures for top risk clinics in a
specific county
▪ Move patients to different clinics, increase home
care based on the risk scores
▪ Monitor county-level forecast for a specific clinic
End User
Risk Scores
County Clinic Patient Staff
Medical Director -
Regional Head - -
Clinic Administrative
Head
-
Identifying High Risk Patients - Solution Architecture
7. 6
CONCLUSION
Similar forecasting tools can be developed for post
Covid-19 scenarios. Healthcare providers can leverage
Machine Learning models to anticipate the demand
for resuming elective surgeries post Covid-19. This will
help them determine the demand forecasts for better
bed management, hospital inventory, and vendor
management.
Healthcare providers can also leverage patient risk
scores for:
▪ Predicting the likelihood of hospital readmissions
within the next 30 days
▪ Identifying the development of certain diseases
such as heart disease, diabetes, or sepsis
▪ Driving population health management programs
Integration of nonclinical data, like customized social
determinants of health, with clinical data will help
improve the quality of risk scores for better analysis of
patient health.
By analyzing risk scores, payers and providers can
measure a patient’s key clinical and lifestyle indicators
and prevent them from developing serious health
conditions. Monitoring of risk scores will also help
estimate costs, deliver quality experience while
reducing patient risk.
9. 8
ABOUT THE AUTHORS
Harish Rijhwani
Delivery Lead, CitiusTech
Harish has 18 years of experience in working across various healthcare domains, delivering value to clients through
technology and business services. Harish holds a Bachelors degree in Electronics Engineering and a Masters degree in
Systems (IT). He is also the author of “Healthcare Decoded – Begin Your Health IT Journey”.
harish.rijhwani@citiustech.com
Sachin Mule
Data Scientist, CitiusTech
Sachin is a data analytics professional with 11+ years of experience. He has worked on key transformation projects
across retail, manufacturing and healthcare industries. Sachin has sound exposure to advanced statistical analysis, data
modelling with regression, classification, optimization, and visualization. Sachin holds a Masters degree in Statistics.
sachin.mule@citiustech.com
Banalaxmi Boruah
Healthcare Consultant, CitiusTech
Banalaxmi has 9+ years of experience in US healthcare consulting across provider, provider sponsored health plans,
Cerner Millennium Suite for physician workflows and ambulatory care. She has worked on several healthcare projects
based on ETL requirements, supply chain, ambulatory care and revenue cycle management.
banalaxmi.boruah@citiustech.com