Healthcare deserts: How accessible is US healthcare?

Healthcare deserts:
How accessible is US
healthcare?
Andrew Kaszpurenko

Agenda
 Introduction: About me and what I do…
 Data for Good: Mapping US Healthcare Desert
 Data Sources
 Methodologies – Brute Force
 Visualization
 Methodologies – Smarter way
 Result and Summary
2

Intro: Andrew Kaszpurenko
 Manager of Advanced Analytics at Edwards
Lifesciences THV Division for the last three years.
 Lead a team that uses a variety of Machine Learning &
AI methods to build models that inform leadership and
business partners to help patients get treatment for
Aortic Stenosis.
 Before joining Edwards, Andrew has had over a
decade of experience working in a variety of industries
from Life Insurance, Health Insurance, Finance, and
Direct Primary Care.
3

Intro: Edwards Lifesciences
Edwards Lifesciences
 Edwards Lifesciences is the global leader in patient-focused medical innovations for
structural heart disease, as well as critical care and surgical monitoring.
 Driven by a passion to help patients, the company collaborates with the world’s
leading clinicians and researchers to address unmet healthcare needs, working to
improve patient outcomes and enhance lives.
 Edwards Lifesciences’ headquarter is in Irvine, CA with 14,000 employees globally.
4
Transcatheter Heart Valve (THV) Division
 A minimally invasive procedure to treat aortic stenosis is also called transcatheter
aortic valve replacement (TAVR).
 In the past, many people suffering from severe aortic stenosis had limited options to
replace an unhealthy valve, such as open heart surgery. Since 2011, TAVR has
opened a door of possibilities and options for treating people in the United States with
severe aortic stenosis.

Problem statement
 Medical Background: Aortic valve stenosis that is related to increasing age and the
buildup of calcium deposits on the aortic valve is most common in older people. It
usually doesn't cause symptoms until ages 70 or 80.
 Background: Medicare was looking to revise TAVR coverage decisions and some
organizations were pushing for tighter requirements for which hospitals could
perform the procedure based on volume of other procedures.
5
 Problem Statement: How to show that based on different assumptions made
about how far a patient was willing to travel for treatment, would a policy change
have an impact on access?

Desired End Goal
 Map of the United States, color indicating
where the TAVR deserts would be.
 As granular as possible to really show the
deserts.
 Ability to adjust the distance a patient travels
 Ability to have a few scenarios of hospitals
included or not
 Overlay income and poverty information
6

Tools & Technical Challenge
 Tableau: Business friendly way to visually display and interface with the data
 Python: To clean & organize the data in a manner that makes it friendly for Tableau
to work with.
7
Technical Challenge:
 There are 30,000 zip codes in the US, but don’t want Tableau to have to do
nearest point calculations every time the user changes the parameters.
 A nearest point calc would be 450M data points having to be calculated
every time the user changes the parameter (30,000 x 30,000)/2

Input Tables
Two tables needed:
 List of Hospitals and their Zip Code
– Plus the different scenarios we want to
consider
 A crosswalk of all the Zip Codes and their Lat,
Long
 Both are publicly available:
– https://public.opendatasoft.com/explore/dataset/us-zip-
code-latitude-and-longitude/table/
– https://data.medicare.gov/Hospital-Compare/Hospital-
General-Information/xubh-q36u
8
List of Hospitals
List of Zip Codes and Lat Long

First Thoughts
Initial Thoughts:
 Create a table outside of Tableau
 Color coded by how far the closest hospital is:
1. Take every zip and compare it’s distance to every
other zip and calculate the distance.
2. Mask only the hospitals (zips) we are interested in.
3. Take the minimum distance for each zip in the table.
 This means it’s (30k zips x 30k zips)/2 = 450M calcs.
9
Revise this:
 Can only do just the hospitals.
 30k zips x 800 hospital zips = 24M calcs

Methodology – Brute Force
10
Steps:
 Import and clean every hospital and every
zip
 Take 30k zip and give it 800 possible
locations to go to and calculate the distance.
 Take the minimum distance for every zip to
all hospital pairs.

Code I – Import the Data and Basic clean up
11
 Import the Zip Code information as a
DataFrame
 Import the Hospitals information as a
DataFrame
– Some hospitals share the same zip
code as another hospital, so no need to
do the calc more than once
– Merge in the Lat/Long into the Hospital
DataFrame
Process

Code II – Create Master Dataframe and Calc Distance
12
 Create a new DataFrame with All Hospitals
Mapped to all possible Zip Codes.
– Ie. Hospital 1 will have 30,000 points
– There are 18M rows now
Process
 Now have a Data Frame with every hospital
and all the zips and distance from it
 Min distance along each zip
 Run the function against each row and
return the miles from it.
– Uses geopy’s - geodesic function
– On a i7-8650U it takes 1hr 25mins mins

 Taking the clean output
table and loading it into
Tableau.
 Making the map a dual
axis map to get the
hospital locations overlaid
with the result
 Using a simple color
palate where anyone can
tell what is good or bad.
13
Process
Setup of Visual

 Adding in Population statistics
– Census
 Income Metrics
– IRS.gov
14
Process
Adding Layers of Detail

Methodologies – Faster Way
15
Divide into smaller problems:
 Lat, Long are just two points on a
surface.
 Just search that region
 Using scikit K-d Tree
What’s taking so long:
 Many repetitive calculations that are
useless.
– Want to avoid calculating if Irvine is closer
to NYC than Los Angeles. Nearest Neighbor

K-D Tree basics
Divide into a smaller problem:
 Lat, Long are just two points on a surface.
 Divide into smaller problems
 Using scikit K-D Tree
16
a
b
c
d
e
f
g
h
i
j
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K-D Tree
z
x ≥6
y ≥ 6 y ≥ 6
a,b,c,d,e I,f,h,j,g
I,f,hj,ga,d,eb,c

Ball Tree
Divide into a smaller problem:
 Because these points are on a sphere (low
dimensional manifold) K-D won’t work
 But can use Ball Tree library (scikit-learn)
17
https://towardsdatascience.com/using-scikit-learns-binary-trees-to-efficiently-find-latitude-and-longitude-neighbors-909979bd929b
https://towardsdatascience.com/tree-algorithms-explained-ball-tree-algorithm-vs-kd-tree-vs-brute-force-9746debcd940
a
b
c
d
e
f
g
h
i
j
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Ball Tree
z  Conceptually similar to K-D Trees

Code II – Add in Potential Scenarios
18
 Using same main data frames
as before, ZipCode and
Hospital
 Create Scenarios
 Convert Lat and Long into
Radians
Process
 Create Dataframes based on
the scenario

Code II – Ball Tree
19
 Using same 2 main data
frames as before
– ZipCode
– Hospital
Process
 Take the Array and put it into a
DataFrame
 Query the Ball tree against all
zips (30k).
 k=1, how many sites to return
 1 radian = 3959 miles

Code II – Final
20
 Run for each scenario
 Stack on top of itself for each
scenario.
– Tableau likes long and skinny
data
– 3 scenarios = 30k x 3 = 90k
rows
Process

Go to Tableau
21
Go To Tableau

 White space is your friend
 Remove grid lines &
default shading
 Use a set palate and font
size
 Each thing should matter,
if not remove
 Recognize the difference
between a presentation vs
dashboard
22
Visual Best Practices

Result and Summary
 This analysis contributed to internal analysis to understand different proposed Medicare
Guidelines.
 Help to understand and quantify the geographic limitations to treatment accessibility.
23
Other applications:
 Logistics and distribution
– Food accessibility, supply chain, etc
Edwards Lifesciences is looking for passionate data professionals, visit
www.edwards.com

Code 1 - Setup
from sklearn.neighbors import BallTree, KDTree
import numpy as np
import pandas as pd
from geopy.distance import geodesic
def get_scenario_list(df, mask, target_col):
# returns a target_col subset as a list based on condition defined in mask
foo_list = df[mask][target_col].to_list()
return foo_list
# Import ZIP Code geo mapping file
import_cols = ['ZipCode', 'Latitude', 'Longitude', 'ShowMap', 'City', 'State', 'Population',
'PopOver65', 'Median_Income', 'Average_Income']
df_zip = pd.read_excel(path + "ZipClean.xlsx", sheet_name="UniqueZip",
dtype={"ZipCode": str}, usecols=import_cols)
# ZIP code subset of useable zips
df_zip.drop_duplicates(['ZipCode'], keep='first', inplace=True)
df_zip.reset_index(drop=True, inplace=True)
df_zip.rename(columns={'Latitude': 'LAT', 'Longitude': 'LON'}, inplace=True)
df_zip = df_zip[np.isfinite(df_zip['LAT'])].reset_index(drop=True)
# Import Hospital File
df_hos_final = pd.read_excel(path2 + "HospitalFile.xlsx", sheet_name='Hospitals',
dtype={"Facility Zip": str})
df_hos_final.rename(columns={"Facility Zip": 'ZipCode'}, inplace=True)
# Merge Map information to hospital file to get Lat long for each hospital ZIP code
df_hos_final = pd.merge(df_hos_final, df_zip[['ZipCode', 'LAT', 'LON']], how="left", on=["ZipCode"])
# drop duplicates
df_hos = df_hos_final[['ZipCode', 'ScenerioCurrent', 'ScenerioPotential', 'LAT', 'LON']].copy()
df_hos.drop_duplicates(['ZipCode'], keep='first', inplace=True)
df_hos.reset_index(drop=True, inplace=True)
df_hos = df_hos[np.isfinite(df_hos['LAT'])].reset_index(drop=True)

Code 2 – Brute Force
df_zip.rename(columns={'Lat':'Latz', 'Long':'Longz'}, inplace=True)
df_hos.rename(columns={'Lat':'Lath', 'Long':'Longh'}, inplace=True)
# Create a new DataFrame with every hospital to every possible zip available
df_dist = pd.merge(df_hos.assign(key=0), df_zip.assign(key=0), on='key').drop('key', axis=1)
df_dist.rename(columns={'ZipCode_x':'Zip_Hos', 'ZipCode_y':'Zip_Map'}, inplace=True)
# Run Distance Calc
df_dist['miles'] = df_dist.apply((lambda row: geodesic((row['Latz'], row['Longz']),
(row['Lath'], row['Longh'])).miles), axis=1)
df_dist.reset_index(drop=True, inplace=True)

Code 3 – Nearest Neighbor (Ball Tree)
scenario_cur_mask = df_hos['ScenerioCurrent'] == 1
scenario_pot_mask = df_hos['ScenerioPotential'] == 1
# Use dictionary to store all of the scenario lists. This will be iterated through below
scenarios_dict = { 'cur': get_scenario_list(df_hos, scenario_cur_mask, 'ZipCode'),
'pot': get_scenario_list(df_hos, scenario_pot_mask, 'ZipCode')
}
# Creates new columns converting coordinate degrees to radians (both dfs)
for df in [df_hos, df_zip]:
for col in ['LAT', 'LON']:
rad = np.deg2rad(df[col].values)
df[f'{col}_rad'] = rad
# loop through each scenario in scenarios_dict, output BallTree nearest neighbor distance to closest hospital for every U.S. ZIP code
# add a flag for which scenario it is
# output for each scenario will be same length as df_zip
df_dist = pd.DataFrame()
for scenario in scenarios_dict:
# subset df_hos by each scenario
zip_list = scenarios_dict[scenario]
locations_a = df_hos[df_hos['ZipCode'].isin(zip_list)].copy()
locations_b = df_zip.copy()
# BallTree nearest neighbor distance
ball = BallTree(locations_a[["LAT_rad", "LON_rad"]].values, metric='haversine')
distances, indices = ball.query(locations_b[['LAT_rad', 'LON_rad']].values, k=1)
distances = distances * 3958.8
# get distances into a df, concatenate
df_temp = pd.DataFrame(data=distances, index=locations_b['ZipCode'])
df_temp.rename(columns={0: 'Miles'}, inplace=True)
df_temp.reset_index(drop=False, inplace=True)
df_temp['scenario'] = scenario
df_dist = pd.concat([df_dist, df_temp], ignore_index=True)
df_dist.reset_index(drop=True, inplace=True)
# merge in map data
cols_zip = ['ZipCode', 'City', 'State', 'Population', 'PopOver65', 'Median_Income', 'Average_Income']
df_dist = pd.merge(df_dist, df_zip[cols_zip], how="left", on=["ZipCode"])
# cols_hos = ['ZipCode', 'Facility Name', 'ScenerioPotential', 'ScenerioCurrent']
# df_dist = pd.merge(df_dist, df_hos_final[cols_hos], how="left", on=["ZipCode"])
df_dist.to_csv(path2 + "ZipDist_Hos_Sklearn.csv", index=False, header=True, encoding='utf8')

Healthcare deserts: How accessible is US healthcare?

Recommended

Recommended

More Related Content

Similar to Healthcare deserts: How accessible is US healthcare?

Similar to Healthcare deserts: How accessible is US healthcare? (20)

More from Data Con LA

More from Data Con LA (20)

Recently uploaded

Recently uploaded (20)

Healthcare deserts: How accessible is US healthcare?