SlideShare a Scribd company logo
Healthcare deserts:
How accessible is US
Andrew Kaszpurenko
 Introduction: About me and what I do…
 Data for Good: Mapping US Healthcare Desert
 Data Sources
 Methodologies – Brute Force
 Visualization
 Methodologies – Smarter way
 Result and Summary
Intro: Andrew Kaszpurenko
 Manager of Advanced Analytics at Edwards
Lifesciences THV Division for the last three years.
 Lead a team that uses a variety of Machine Learning &
AI methods to build models that inform leadership and
business partners to help patients get treatment for
Aortic Stenosis.
 Before joining Edwards, Andrew has had over a
decade of experience working in a variety of industries
from Life Insurance, Health Insurance, Finance, and
Direct Primary Care.
Intro: Edwards Lifesciences
Edwards Lifesciences
 Edwards Lifesciences is the global leader in patient-focused medical innovations for
structural heart disease, as well as critical care and surgical monitoring.
 Driven by a passion to help patients, the company collaborates with the world’s
leading clinicians and researchers to address unmet healthcare needs, working to
improve patient outcomes and enhance lives.
 Edwards Lifesciences’ headquarter is in Irvine, CA with 14,000 employees globally.
Transcatheter Heart Valve (THV) Division
 A minimally invasive procedure to treat aortic stenosis is also called transcatheter
aortic valve replacement (TAVR).
 In the past, many people suffering from severe aortic stenosis had limited options to
replace an unhealthy valve, such as open heart surgery. Since 2011, TAVR has
opened a door of possibilities and options for treating people in the United States with
severe aortic stenosis.
Problem statement
 Medical Background: Aortic valve stenosis that is related to increasing age and the
buildup of calcium deposits on the aortic valve is most common in older people. It
usually doesn't cause symptoms until ages 70 or 80.
 Background: Medicare was looking to revise TAVR coverage decisions and some
organizations were pushing for tighter requirements for which hospitals could
perform the procedure based on volume of other procedures.
 Problem Statement: How to show that based on different assumptions made
about how far a patient was willing to travel for treatment, would a policy change
have an impact on access?
Desired End Goal
 Map of the United States, color indicating
where the TAVR deserts would be.
 As granular as possible to really show the
 Ability to adjust the distance a patient travels
 Ability to have a few scenarios of hospitals
included or not
 Overlay income and poverty information
Tools & Technical Challenge
 Tableau: Business friendly way to visually display and interface with the data
 Python: To clean & organize the data in a manner that makes it friendly for Tableau
to work with.
Technical Challenge:
 There are 30,000 zip codes in the US, but don’t want Tableau to have to do
nearest point calculations every time the user changes the parameters.
 A nearest point calc would be 450M data points having to be calculated
every time the user changes the parameter (30,000 x 30,000)/2
Input Tables
Two tables needed:
 List of Hospitals and their Zip Code
– Plus the different scenarios we want to
 A crosswalk of all the Zip Codes and their Lat,
 Both are publicly available:
List of Hospitals
List of Zip Codes and Lat Long
First Thoughts
Initial Thoughts:
 Create a table outside of Tableau
 Color coded by how far the closest hospital is:
1. Take every zip and compare it’s distance to every
other zip and calculate the distance.
2. Mask only the hospitals (zips) we are interested in.
3. Take the minimum distance for each zip in the table.
 This means it’s (30k zips x 30k zips)/2 = 450M calcs.
Revise this:
 Can only do just the hospitals.
 30k zips x 800 hospital zips = 24M calcs
Methodology – Brute Force
 Import and clean every hospital and every
 Take 30k zip and give it 800 possible
locations to go to and calculate the distance.
 Take the minimum distance for every zip to
all hospital pairs.
Code I – Import the Data and Basic clean up
 Import the Zip Code information as a
 Import the Hospitals information as a
– Some hospitals share the same zip
code as another hospital, so no need to
do the calc more than once
– Merge in the Lat/Long into the Hospital
Code II – Create Master Dataframe and Calc Distance
 Create a new DataFrame with All Hospitals
Mapped to all possible Zip Codes.
– Ie. Hospital 1 will have 30,000 points
– There are 18M rows now
 Now have a Data Frame with every hospital
and all the zips and distance from it
 Min distance along each zip
 Run the function against each row and
return the miles from it.
– Uses geopy’s - geodesic function
– On a i7-8650U it takes 1hr 25mins mins
 Taking the clean output
table and loading it into
 Making the map a dual
axis map to get the
hospital locations overlaid
with the result
 Using a simple color
palate where anyone can
tell what is good or bad.
Setup of Visual
 Adding in Population statistics
– Census
 Income Metrics
Adding Layers of Detail
Methodologies – Faster Way
Divide into smaller problems:
 Lat, Long are just two points on a
 Just search that region
 Using scikit K-d Tree
What’s taking so long:
 Many repetitive calculations that are
– Want to avoid calculating if Irvine is closer
to NYC than Los Angeles. Nearest Neighbor
K-D Tree basics
Divide into a smaller problem:
 Lat, Long are just two points on a surface.
 Divide into smaller problems
 Using scikit K-D Tree
0 1 2 3 4 5 6 7 8 9 10
K-D Tree
x ≥6
y ≥ 6 y ≥ 6
a,b,c,d,e I,f,h,j,g
Ball Tree
Divide into a smaller problem:
 Because these points are on a sphere (low
dimensional manifold) K-D won’t work
 But can use Ball Tree library (scikit-learn)
0 1 2 3 4 5 6 7 8 9 10
Ball Tree
z  Conceptually similar to K-D Trees
Code II – Add in Potential Scenarios
 Using same main data frames
as before, ZipCode and
 Create Scenarios
 Convert Lat and Long into
 Create Dataframes based on
the scenario
Code II – Ball Tree
 Using same 2 main data
frames as before
– ZipCode
– Hospital
 Take the Array and put it into a
 Query the Ball tree against all
zips (30k).
 k=1, how many sites to return
 1 radian = 3959 miles
Code II – Final
 Run for each scenario
 Stack on top of itself for each
– Tableau likes long and skinny
– 3 scenarios = 30k x 3 = 90k
Go to Tableau
Go To Tableau
 White space is your friend
 Remove grid lines &
default shading
 Use a set palate and font
 Each thing should matter,
if not remove
 Recognize the difference
between a presentation vs
Visual Best Practices
Result and Summary
 This analysis contributed to internal analysis to understand different proposed Medicare
 Help to understand and quantify the geographic limitations to treatment accessibility.
Other applications:
 Logistics and distribution
– Food accessibility, supply chain, etc
Edwards Lifesciences is looking for passionate data professionals, visit
Code 1 - Setup
from sklearn.neighbors import BallTree, KDTree
import numpy as np
import pandas as pd
from geopy.distance import geodesic
def get_scenario_list(df, mask, target_col):
# returns a target_col subset as a list based on condition defined in mask
foo_list = df[mask][target_col].to_list()
return foo_list
# Import ZIP Code geo mapping file
import_cols = ['ZipCode', 'Latitude', 'Longitude', 'ShowMap', 'City', 'State', 'Population',
'PopOver65', 'Median_Income', 'Average_Income']
df_zip = pd.read_excel(path + "ZipClean.xlsx", sheet_name="UniqueZip",
dtype={"ZipCode": str}, usecols=import_cols)
# ZIP code subset of useable zips
df_zip.drop_duplicates(['ZipCode'], keep='first', inplace=True)
df_zip.reset_index(drop=True, inplace=True)
df_zip.rename(columns={'Latitude': 'LAT', 'Longitude': 'LON'}, inplace=True)
df_zip = df_zip[np.isfinite(df_zip['LAT'])].reset_index(drop=True)
# Import Hospital File
df_hos_final = pd.read_excel(path2 + "HospitalFile.xlsx", sheet_name='Hospitals',
dtype={"Facility Zip": str})
df_hos_final.rename(columns={"Facility Zip": 'ZipCode'}, inplace=True)
# Merge Map information to hospital file to get Lat long for each hospital ZIP code
df_hos_final = pd.merge(df_hos_final, df_zip[['ZipCode', 'LAT', 'LON']], how="left", on=["ZipCode"])
# drop duplicates
df_hos = df_hos_final[['ZipCode', 'ScenerioCurrent', 'ScenerioPotential', 'LAT', 'LON']].copy()
df_hos.drop_duplicates(['ZipCode'], keep='first', inplace=True)
df_hos.reset_index(drop=True, inplace=True)
df_hos = df_hos[np.isfinite(df_hos['LAT'])].reset_index(drop=True)
Code 2 – Brute Force
df_zip.rename(columns={'Lat':'Latz', 'Long':'Longz'}, inplace=True)
df_hos.rename(columns={'Lat':'Lath', 'Long':'Longh'}, inplace=True)
# Create a new DataFrame with every hospital to every possible zip available
df_dist = pd.merge(df_hos.assign(key=0), df_zip.assign(key=0), on='key').drop('key', axis=1)
df_dist.rename(columns={'ZipCode_x':'Zip_Hos', 'ZipCode_y':'Zip_Map'}, inplace=True)
# Run Distance Calc
df_dist['miles'] = df_dist.apply((lambda row: geodesic((row['Latz'], row['Longz']),
(row['Lath'], row['Longh'])).miles), axis=1)
df_dist.reset_index(drop=True, inplace=True)
Code 3 – Nearest Neighbor (Ball Tree)
scenario_cur_mask = df_hos['ScenerioCurrent'] == 1
scenario_pot_mask = df_hos['ScenerioPotential'] == 1
# Use dictionary to store all of the scenario lists. This will be iterated through below
scenarios_dict = { 'cur': get_scenario_list(df_hos, scenario_cur_mask, 'ZipCode'),
'pot': get_scenario_list(df_hos, scenario_pot_mask, 'ZipCode')
# Creates new columns converting coordinate degrees to radians (both dfs)
for df in [df_hos, df_zip]:
for col in ['LAT', 'LON']:
rad = np.deg2rad(df[col].values)
df[f'{col}_rad'] = rad
# loop through each scenario in scenarios_dict, output BallTree nearest neighbor distance to closest hospital for every U.S. ZIP code
# add a flag for which scenario it is
# output for each scenario will be same length as df_zip
df_dist = pd.DataFrame()
for scenario in scenarios_dict:
# subset df_hos by each scenario
zip_list = scenarios_dict[scenario]
locations_a = df_hos[df_hos['ZipCode'].isin(zip_list)].copy()
locations_b = df_zip.copy()
# BallTree nearest neighbor distance
ball = BallTree(locations_a[["LAT_rad", "LON_rad"]].values, metric='haversine')
distances, indices = ball.query(locations_b[['LAT_rad', 'LON_rad']].values, k=1)
distances = distances * 3958.8
# get distances into a df, concatenate
df_temp = pd.DataFrame(data=distances, index=locations_b['ZipCode'])
df_temp.rename(columns={0: 'Miles'}, inplace=True)
df_temp.reset_index(drop=False, inplace=True)
df_temp['scenario'] = scenario
df_dist = pd.concat([df_dist, df_temp], ignore_index=True)
df_dist.reset_index(drop=True, inplace=True)
# merge in map data
cols_zip = ['ZipCode', 'City', 'State', 'Population', 'PopOver65', 'Median_Income', 'Average_Income']
df_dist = pd.merge(df_dist, df_zip[cols_zip], how="left", on=["ZipCode"])
# cols_hos = ['ZipCode', 'Facility Name', 'ScenerioPotential', 'ScenerioCurrent']
# df_dist = pd.merge(df_dist, df_hos_final[cols_hos], how="left", on=["ZipCode"])
df_dist.to_csv(path2 + "ZipDist_Hos_Sklearn.csv", index=False, header=True, encoding='utf8')

More Related Content

Similar to Healthcare deserts: How accessible is US healthcare?

Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
Madhavi tippani
Madhavi tippaniMadhavi tippani
Madhavi tippani
Madhavi Tippani
IRJET Journal
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
HealthOrzo – Your Health Matters
HealthOrzo – Your Health MattersHealthOrzo – Your Health Matters
HealthOrzo – Your Health Matters
IRJET Journal
A Review on Covid Detection using Cross Dataset Analysis
A Review on Covid Detection using Cross Dataset AnalysisA Review on Covid Detection using Cross Dataset Analysis
A Review on Covid Detection using Cross Dataset Analysis
IRJET Journal
1530 track2 humphrey
1530 track2 humphrey1530 track2 humphrey
1530 track2 humphrey
Rising Media, Inc.
Data Mining.ppt
Data Mining.pptData Mining.ppt
Data Mining.ppt
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
IRJET Journal
Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification Algorithms
Jieming Wei
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
Anat Reiner-Benaim
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining
Suman Chatterjee
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop ClusterIRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET Journal
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
The Statistical and Applied Mathematical Sciences Institute
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Seval Çapraz
Societal Impact of Applied Data Science on the Big Data Stack
Societal Impact of Applied Data Science on the Big Data StackSocietal Impact of Applied Data Science on the Big Data Stack
Societal Impact of Applied Data Science on the Big Data Stack
Stealth Project
Heart Disease Prediction using Machine Learning Algorithms
Heart Disease Prediction using Machine Learning AlgorithmsHeart Disease Prediction using Machine Learning Algorithms
Heart Disease Prediction using Machine Learning Algorithms
IRJET Journal

Similar to Healthcare deserts: How accessible is US healthcare? (20)

Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
Madhavi tippani
Madhavi tippaniMadhavi tippani
Madhavi tippani
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
HealthOrzo – Your Health Matters
HealthOrzo – Your Health MattersHealthOrzo – Your Health Matters
HealthOrzo – Your Health Matters
A Review on Covid Detection using Cross Dataset Analysis
A Review on Covid Detection using Cross Dataset AnalysisA Review on Covid Detection using Cross Dataset Analysis
A Review on Covid Detection using Cross Dataset Analysis
1530 track2 humphrey
1530 track2 humphrey1530 track2 humphrey
1530 track2 humphrey
Data Mining.ppt
Data Mining.pptData Mining.ppt
Data Mining.ppt
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification Algorithms
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop ClusterIRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Societal Impact of Applied Data Science on the Big Data Stack
Societal Impact of Applied Data Science on the Big Data StackSocietal Impact of Applied Data Science on the Big Data Stack
Societal Impact of Applied Data Science on the Big Data Stack
Heart Disease Prediction using Machine Learning Algorithms
Heart Disease Prediction using Machine Learning AlgorithmsHeart Disease Prediction using Machine Learning Algorithms
Heart Disease Prediction using Machine Learning Algorithms

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
Data Con LA
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
Data Con LA
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
Data Con LA
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
Data Con LA
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka

Recently uploaded

Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood

Recently uploaded (20)

Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed

Healthcare deserts: How accessible is US healthcare?

  • 1. Healthcare deserts: How accessible is US healthcare? Andrew Kaszpurenko
  • 2. Agenda  Introduction: About me and what I do…  Data for Good: Mapping US Healthcare Desert  Data Sources  Methodologies – Brute Force  Visualization  Methodologies – Smarter way  Result and Summary 2
  • 3. Intro: Andrew Kaszpurenko  Manager of Advanced Analytics at Edwards Lifesciences THV Division for the last three years.  Lead a team that uses a variety of Machine Learning & AI methods to build models that inform leadership and business partners to help patients get treatment for Aortic Stenosis.  Before joining Edwards, Andrew has had over a decade of experience working in a variety of industries from Life Insurance, Health Insurance, Finance, and Direct Primary Care. 3
  • 4. Intro: Edwards Lifesciences Edwards Lifesciences  Edwards Lifesciences is the global leader in patient-focused medical innovations for structural heart disease, as well as critical care and surgical monitoring.  Driven by a passion to help patients, the company collaborates with the world’s leading clinicians and researchers to address unmet healthcare needs, working to improve patient outcomes and enhance lives.  Edwards Lifesciences’ headquarter is in Irvine, CA with 14,000 employees globally. 4 Transcatheter Heart Valve (THV) Division  A minimally invasive procedure to treat aortic stenosis is also called transcatheter aortic valve replacement (TAVR).  In the past, many people suffering from severe aortic stenosis had limited options to replace an unhealthy valve, such as open heart surgery. Since 2011, TAVR has opened a door of possibilities and options for treating people in the United States with severe aortic stenosis.
  • 5. Problem statement  Medical Background: Aortic valve stenosis that is related to increasing age and the buildup of calcium deposits on the aortic valve is most common in older people. It usually doesn't cause symptoms until ages 70 or 80.  Background: Medicare was looking to revise TAVR coverage decisions and some organizations were pushing for tighter requirements for which hospitals could perform the procedure based on volume of other procedures. 5  Problem Statement: How to show that based on different assumptions made about how far a patient was willing to travel for treatment, would a policy change have an impact on access?
  • 6. Desired End Goal  Map of the United States, color indicating where the TAVR deserts would be.  As granular as possible to really show the deserts.  Ability to adjust the distance a patient travels  Ability to have a few scenarios of hospitals included or not  Overlay income and poverty information 6
  • 7. Tools & Technical Challenge  Tableau: Business friendly way to visually display and interface with the data  Python: To clean & organize the data in a manner that makes it friendly for Tableau to work with. 7 Technical Challenge:  There are 30,000 zip codes in the US, but don’t want Tableau to have to do nearest point calculations every time the user changes the parameters.  A nearest point calc would be 450M data points having to be calculated every time the user changes the parameter (30,000 x 30,000)/2
  • 8. Input Tables Two tables needed:  List of Hospitals and their Zip Code – Plus the different scenarios we want to consider  A crosswalk of all the Zip Codes and their Lat, Long  Both are publicly available: – code-latitude-and-longitude/table/ – General-Information/xubh-q36u 8 List of Hospitals List of Zip Codes and Lat Long
  • 9. First Thoughts Initial Thoughts:  Create a table outside of Tableau  Color coded by how far the closest hospital is: 1. Take every zip and compare it’s distance to every other zip and calculate the distance. 2. Mask only the hospitals (zips) we are interested in. 3. Take the minimum distance for each zip in the table.  This means it’s (30k zips x 30k zips)/2 = 450M calcs. 9 Revise this:  Can only do just the hospitals.  30k zips x 800 hospital zips = 24M calcs
  • 10. Methodology – Brute Force 10 Steps:  Import and clean every hospital and every zip  Take 30k zip and give it 800 possible locations to go to and calculate the distance.  Take the minimum distance for every zip to all hospital pairs.
  • 11. Code I – Import the Data and Basic clean up 11  Import the Zip Code information as a DataFrame  Import the Hospitals information as a DataFrame – Some hospitals share the same zip code as another hospital, so no need to do the calc more than once – Merge in the Lat/Long into the Hospital DataFrame Process
  • 12. Code II – Create Master Dataframe and Calc Distance 12  Create a new DataFrame with All Hospitals Mapped to all possible Zip Codes. – Ie. Hospital 1 will have 30,000 points – There are 18M rows now Process  Now have a Data Frame with every hospital and all the zips and distance from it  Min distance along each zip  Run the function against each row and return the miles from it. – Uses geopy’s - geodesic function – On a i7-8650U it takes 1hr 25mins mins
  • 13.  Taking the clean output table and loading it into Tableau.  Making the map a dual axis map to get the hospital locations overlaid with the result  Using a simple color palate where anyone can tell what is good or bad. 13 Process Setup of Visual
  • 14.  Adding in Population statistics – Census  Income Metrics – 14 Process Adding Layers of Detail
  • 15. Methodologies – Faster Way 15 Divide into smaller problems:  Lat, Long are just two points on a surface.  Just search that region  Using scikit K-d Tree What’s taking so long:  Many repetitive calculations that are useless. – Want to avoid calculating if Irvine is closer to NYC than Los Angeles. Nearest Neighbor
  • 16. K-D Tree basics Divide into a smaller problem:  Lat, Long are just two points on a surface.  Divide into smaller problems  Using scikit K-D Tree 16 a b c d e f g h i j 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 K-D Tree z x ≥6 y ≥ 6 y ≥ 6 a,b,c,d,e I,f,h,j,g I,f,hj,ga,d,eb,c
  • 17. Ball Tree Divide into a smaller problem:  Because these points are on a sphere (low dimensional manifold) K-D won’t work  But can use Ball Tree library (scikit-learn) 17 a b c d e f g h i j 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Ball Tree z  Conceptually similar to K-D Trees
  • 18. Code II – Add in Potential Scenarios 18  Using same main data frames as before, ZipCode and Hospital  Create Scenarios  Convert Lat and Long into Radians Process  Create Dataframes based on the scenario
  • 19. Code II – Ball Tree 19  Using same 2 main data frames as before – ZipCode – Hospital Process  Take the Array and put it into a DataFrame  Query the Ball tree against all zips (30k).  k=1, how many sites to return  1 radian = 3959 miles
  • 20. Code II – Final 20  Run for each scenario  Stack on top of itself for each scenario. – Tableau likes long and skinny data – 3 scenarios = 30k x 3 = 90k rows Process
  • 21. Go to Tableau 21 Go To Tableau
  • 22.  White space is your friend  Remove grid lines & default shading  Use a set palate and font size  Each thing should matter, if not remove  Recognize the difference between a presentation vs dashboard 22 Visual Best Practices
  • 23. Result and Summary  This analysis contributed to internal analysis to understand different proposed Medicare Guidelines.  Help to understand and quantify the geographic limitations to treatment accessibility. 23 Other applications:  Logistics and distribution – Food accessibility, supply chain, etc Edwards Lifesciences is looking for passionate data professionals, visit
  • 24.
  • 25. Code 1 - Setup from sklearn.neighbors import BallTree, KDTree import numpy as np import pandas as pd from geopy.distance import geodesic def get_scenario_list(df, mask, target_col): # returns a target_col subset as a list based on condition defined in mask foo_list = df[mask][target_col].to_list() return foo_list # Import ZIP Code geo mapping file import_cols = ['ZipCode', 'Latitude', 'Longitude', 'ShowMap', 'City', 'State', 'Population', 'PopOver65', 'Median_Income', 'Average_Income'] df_zip = pd.read_excel(path + "ZipClean.xlsx", sheet_name="UniqueZip", dtype={"ZipCode": str}, usecols=import_cols) # ZIP code subset of useable zips df_zip.drop_duplicates(['ZipCode'], keep='first', inplace=True) df_zip.reset_index(drop=True, inplace=True) df_zip.rename(columns={'Latitude': 'LAT', 'Longitude': 'LON'}, inplace=True) df_zip = df_zip[np.isfinite(df_zip['LAT'])].reset_index(drop=True) # Import Hospital File df_hos_final = pd.read_excel(path2 + "HospitalFile.xlsx", sheet_name='Hospitals', dtype={"Facility Zip": str}) df_hos_final.rename(columns={"Facility Zip": 'ZipCode'}, inplace=True) # Merge Map information to hospital file to get Lat long for each hospital ZIP code df_hos_final = pd.merge(df_hos_final, df_zip[['ZipCode', 'LAT', 'LON']], how="left", on=["ZipCode"]) # drop duplicates df_hos = df_hos_final[['ZipCode', 'ScenerioCurrent', 'ScenerioPotential', 'LAT', 'LON']].copy() df_hos.drop_duplicates(['ZipCode'], keep='first', inplace=True) df_hos.reset_index(drop=True, inplace=True) df_hos = df_hos[np.isfinite(df_hos['LAT'])].reset_index(drop=True)
  • 26. Code 2 – Brute Force df_zip.rename(columns={'Lat':'Latz', 'Long':'Longz'}, inplace=True) df_hos.rename(columns={'Lat':'Lath', 'Long':'Longh'}, inplace=True) # Create a new DataFrame with every hospital to every possible zip available df_dist = pd.merge(df_hos.assign(key=0), df_zip.assign(key=0), on='key').drop('key', axis=1) df_dist.rename(columns={'ZipCode_x':'Zip_Hos', 'ZipCode_y':'Zip_Map'}, inplace=True) # Run Distance Calc df_dist['miles'] = df_dist.apply((lambda row: geodesic((row['Latz'], row['Longz']), (row['Lath'], row['Longh'])).miles), axis=1) df_dist.reset_index(drop=True, inplace=True)
  • 27. Code 3 – Nearest Neighbor (Ball Tree) scenario_cur_mask = df_hos['ScenerioCurrent'] == 1 scenario_pot_mask = df_hos['ScenerioPotential'] == 1 # Use dictionary to store all of the scenario lists. This will be iterated through below scenarios_dict = { 'cur': get_scenario_list(df_hos, scenario_cur_mask, 'ZipCode'), 'pot': get_scenario_list(df_hos, scenario_pot_mask, 'ZipCode') } # Creates new columns converting coordinate degrees to radians (both dfs) for df in [df_hos, df_zip]: for col in ['LAT', 'LON']: rad = np.deg2rad(df[col].values) df[f'{col}_rad'] = rad # loop through each scenario in scenarios_dict, output BallTree nearest neighbor distance to closest hospital for every U.S. ZIP code # add a flag for which scenario it is # output for each scenario will be same length as df_zip df_dist = pd.DataFrame() for scenario in scenarios_dict: # subset df_hos by each scenario zip_list = scenarios_dict[scenario] locations_a = df_hos[df_hos['ZipCode'].isin(zip_list)].copy() locations_b = df_zip.copy() # BallTree nearest neighbor distance ball = BallTree(locations_a[["LAT_rad", "LON_rad"]].values, metric='haversine') distances, indices = ball.query(locations_b[['LAT_rad', 'LON_rad']].values, k=1) distances = distances * 3958.8 # get distances into a df, concatenate df_temp = pd.DataFrame(data=distances, index=locations_b['ZipCode']) df_temp.rename(columns={0: 'Miles'}, inplace=True) df_temp.reset_index(drop=False, inplace=True) df_temp['scenario'] = scenario df_dist = pd.concat([df_dist, df_temp], ignore_index=True) df_dist.reset_index(drop=True, inplace=True) # merge in map data cols_zip = ['ZipCode', 'City', 'State', 'Population', 'PopOver65', 'Median_Income', 'Average_Income'] df_dist = pd.merge(df_dist, df_zip[cols_zip], how="left", on=["ZipCode"]) # cols_hos = ['ZipCode', 'Facility Name', 'ScenerioPotential', 'ScenerioCurrent'] # df_dist = pd.merge(df_dist, df_hos_final[cols_hos], how="left", on=["ZipCode"]) df_dist.to_csv(path2 + "ZipDist_Hos_Sklearn.csv", index=False, header=True, encoding='utf8')