SlideShare a Scribd company logo
1 of 27
Healthcare deserts:
How accessible is US
healthcare?
Andrew Kaszpurenko
Agenda
 Introduction: About me and what I do…
 Data for Good: Mapping US Healthcare Desert
 Data Sources
 Methodologies – Brute Force
 Visualization
 Methodologies – Smarter way
 Result and Summary
2
Intro: Andrew Kaszpurenko
 Manager of Advanced Analytics at Edwards
Lifesciences THV Division for the last three years.
 Lead a team that uses a variety of Machine Learning &
AI methods to build models that inform leadership and
business partners to help patients get treatment for
Aortic Stenosis.
 Before joining Edwards, Andrew has had over a
decade of experience working in a variety of industries
from Life Insurance, Health Insurance, Finance, and
Direct Primary Care.
3
Intro: Edwards Lifesciences
Edwards Lifesciences
 Edwards Lifesciences is the global leader in patient-focused medical innovations for
structural heart disease, as well as critical care and surgical monitoring.
 Driven by a passion to help patients, the company collaborates with the world’s
leading clinicians and researchers to address unmet healthcare needs, working to
improve patient outcomes and enhance lives.
 Edwards Lifesciences’ headquarter is in Irvine, CA with 14,000 employees globally.
4
Transcatheter Heart Valve (THV) Division
 A minimally invasive procedure to treat aortic stenosis is also called transcatheter
aortic valve replacement (TAVR).
 In the past, many people suffering from severe aortic stenosis had limited options to
replace an unhealthy valve, such as open heart surgery. Since 2011, TAVR has
opened a door of possibilities and options for treating people in the United States with
severe aortic stenosis.
Problem statement
 Medical Background: Aortic valve stenosis that is related to increasing age and the
buildup of calcium deposits on the aortic valve is most common in older people. It
usually doesn't cause symptoms until ages 70 or 80.
 Background: Medicare was looking to revise TAVR coverage decisions and some
organizations were pushing for tighter requirements for which hospitals could
perform the procedure based on volume of other procedures.
5
 Problem Statement: How to show that based on different assumptions made
about how far a patient was willing to travel for treatment, would a policy change
have an impact on access?
Desired End Goal
 Map of the United States, color indicating
where the TAVR deserts would be.
 As granular as possible to really show the
deserts.
 Ability to adjust the distance a patient travels
 Ability to have a few scenarios of hospitals
included or not
 Overlay income and poverty information
6
Tools & Technical Challenge
 Tableau: Business friendly way to visually display and interface with the data
 Python: To clean & organize the data in a manner that makes it friendly for Tableau
to work with.
7
Technical Challenge:
 There are 30,000 zip codes in the US, but don’t want Tableau to have to do
nearest point calculations every time the user changes the parameters.
 A nearest point calc would be 450M data points having to be calculated
every time the user changes the parameter (30,000 x 30,000)/2
Input Tables
Two tables needed:
 List of Hospitals and their Zip Code
– Plus the different scenarios we want to
consider
 A crosswalk of all the Zip Codes and their Lat,
Long
 Both are publicly available:
– https://public.opendatasoft.com/explore/dataset/us-zip-
code-latitude-and-longitude/table/
– https://data.medicare.gov/Hospital-Compare/Hospital-
General-Information/xubh-q36u
8
List of Hospitals
List of Zip Codes and Lat Long
First Thoughts
Initial Thoughts:
 Create a table outside of Tableau
 Color coded by how far the closest hospital is:
1. Take every zip and compare it’s distance to every
other zip and calculate the distance.
2. Mask only the hospitals (zips) we are interested in.
3. Take the minimum distance for each zip in the table.
 This means it’s (30k zips x 30k zips)/2 = 450M calcs.
9
Revise this:
 Can only do just the hospitals.
 30k zips x 800 hospital zips = 24M calcs
Methodology – Brute Force
10
Steps:
 Import and clean every hospital and every
zip
 Take 30k zip and give it 800 possible
locations to go to and calculate the distance.
 Take the minimum distance for every zip to
all hospital pairs.
Code I – Import the Data and Basic clean up
11
 Import the Zip Code information as a
DataFrame
 Import the Hospitals information as a
DataFrame
– Some hospitals share the same zip
code as another hospital, so no need to
do the calc more than once
– Merge in the Lat/Long into the Hospital
DataFrame
Process
Code II – Create Master Dataframe and Calc Distance
12
 Create a new DataFrame with All Hospitals
Mapped to all possible Zip Codes.
– Ie. Hospital 1 will have 30,000 points
– There are 18M rows now
Process
 Now have a Data Frame with every hospital
and all the zips and distance from it
 Min distance along each zip
 Run the function against each row and
return the miles from it.
– Uses geopy’s - geodesic function
– On a i7-8650U it takes 1hr 25mins mins
 Taking the clean output
table and loading it into
Tableau.
 Making the map a dual
axis map to get the
hospital locations overlaid
with the result
 Using a simple color
palate where anyone can
tell what is good or bad.
13
Process
Setup of Visual
 Adding in Population statistics
– Census
 Income Metrics
– IRS.gov
14
Process
Adding Layers of Detail
Methodologies – Faster Way
15
Divide into smaller problems:
 Lat, Long are just two points on a
surface.
 Just search that region
 Using scikit K-d Tree
What’s taking so long:
 Many repetitive calculations that are
useless.
– Want to avoid calculating if Irvine is closer
to NYC than Los Angeles. Nearest Neighbor
K-D Tree basics
Divide into a smaller problem:
 Lat, Long are just two points on a surface.
 Divide into smaller problems
 Using scikit K-D Tree
16
a
b
c
d
e
f
g
h
i
j
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K-D Tree
z
x ≥6
y ≥ 6 y ≥ 6
a,b,c,d,e I,f,h,j,g
I,f,hj,ga,d,eb,c
Ball Tree
Divide into a smaller problem:
 Because these points are on a sphere (low
dimensional manifold) K-D won’t work
 But can use Ball Tree library (scikit-learn)
17
https://towardsdatascience.com/using-scikit-learns-binary-trees-to-efficiently-find-latitude-and-longitude-neighbors-909979bd929b
https://towardsdatascience.com/tree-algorithms-explained-ball-tree-algorithm-vs-kd-tree-vs-brute-force-9746debcd940
a
b
c
d
e
f
g
h
i
j
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
Ball Tree
z  Conceptually similar to K-D Trees
Code II – Add in Potential Scenarios
18
 Using same main data frames
as before, ZipCode and
Hospital
 Create Scenarios
 Convert Lat and Long into
Radians
Process
 Create Dataframes based on
the scenario
Code II – Ball Tree
19
 Using same 2 main data
frames as before
– ZipCode
– Hospital
Process
 Take the Array and put it into a
DataFrame
 Query the Ball tree against all
zips (30k).
 k=1, how many sites to return
 1 radian = 3959 miles
Code II – Final
20
 Run for each scenario
 Stack on top of itself for each
scenario.
– Tableau likes long and skinny
data
– 3 scenarios = 30k x 3 = 90k
rows
Process
Go to Tableau
21
Go To Tableau
 White space is your friend
 Remove grid lines &
default shading
 Use a set palate and font
size
 Each thing should matter,
if not remove
 Recognize the difference
between a presentation vs
dashboard
22
Visual Best Practices
Result and Summary
 This analysis contributed to internal analysis to understand different proposed Medicare
Guidelines.
 Help to understand and quantify the geographic limitations to treatment accessibility.
23
Other applications:
 Logistics and distribution
– Food accessibility, supply chain, etc
Edwards Lifesciences is looking for passionate data professionals, visit
www.edwards.com
Code 1 - Setup
from sklearn.neighbors import BallTree, KDTree
import numpy as np
import pandas as pd
from geopy.distance import geodesic
def get_scenario_list(df, mask, target_col):
# returns a target_col subset as a list based on condition defined in mask
foo_list = df[mask][target_col].to_list()
return foo_list
# Import ZIP Code geo mapping file
import_cols = ['ZipCode', 'Latitude', 'Longitude', 'ShowMap', 'City', 'State', 'Population',
'PopOver65', 'Median_Income', 'Average_Income']
df_zip = pd.read_excel(path + "ZipClean.xlsx", sheet_name="UniqueZip",
dtype={"ZipCode": str}, usecols=import_cols)
# ZIP code subset of useable zips
df_zip.drop_duplicates(['ZipCode'], keep='first', inplace=True)
df_zip.reset_index(drop=True, inplace=True)
df_zip.rename(columns={'Latitude': 'LAT', 'Longitude': 'LON'}, inplace=True)
df_zip = df_zip[np.isfinite(df_zip['LAT'])].reset_index(drop=True)
# Import Hospital File
df_hos_final = pd.read_excel(path2 + "HospitalFile.xlsx", sheet_name='Hospitals',
dtype={"Facility Zip": str})
df_hos_final.rename(columns={"Facility Zip": 'ZipCode'}, inplace=True)
# Merge Map information to hospital file to get Lat long for each hospital ZIP code
df_hos_final = pd.merge(df_hos_final, df_zip[['ZipCode', 'LAT', 'LON']], how="left", on=["ZipCode"])
# drop duplicates
df_hos = df_hos_final[['ZipCode', 'ScenerioCurrent', 'ScenerioPotential', 'LAT', 'LON']].copy()
df_hos.drop_duplicates(['ZipCode'], keep='first', inplace=True)
df_hos.reset_index(drop=True, inplace=True)
df_hos = df_hos[np.isfinite(df_hos['LAT'])].reset_index(drop=True)
Code 2 – Brute Force
df_zip.rename(columns={'Lat':'Latz', 'Long':'Longz'}, inplace=True)
df_hos.rename(columns={'Lat':'Lath', 'Long':'Longh'}, inplace=True)
# Create a new DataFrame with every hospital to every possible zip available
df_dist = pd.merge(df_hos.assign(key=0), df_zip.assign(key=0), on='key').drop('key', axis=1)
df_dist.rename(columns={'ZipCode_x':'Zip_Hos', 'ZipCode_y':'Zip_Map'}, inplace=True)
# Run Distance Calc
df_dist['miles'] = df_dist.apply((lambda row: geodesic((row['Latz'], row['Longz']),
(row['Lath'], row['Longh'])).miles), axis=1)
df_dist.reset_index(drop=True, inplace=True)
Code 3 – Nearest Neighbor (Ball Tree)
scenario_cur_mask = df_hos['ScenerioCurrent'] == 1
scenario_pot_mask = df_hos['ScenerioPotential'] == 1
# Use dictionary to store all of the scenario lists. This will be iterated through below
scenarios_dict = { 'cur': get_scenario_list(df_hos, scenario_cur_mask, 'ZipCode'),
'pot': get_scenario_list(df_hos, scenario_pot_mask, 'ZipCode')
}
# Creates new columns converting coordinate degrees to radians (both dfs)
for df in [df_hos, df_zip]:
for col in ['LAT', 'LON']:
rad = np.deg2rad(df[col].values)
df[f'{col}_rad'] = rad
# loop through each scenario in scenarios_dict, output BallTree nearest neighbor distance to closest hospital for every U.S. ZIP code
# add a flag for which scenario it is
# output for each scenario will be same length as df_zip
df_dist = pd.DataFrame()
for scenario in scenarios_dict:
# subset df_hos by each scenario
zip_list = scenarios_dict[scenario]
locations_a = df_hos[df_hos['ZipCode'].isin(zip_list)].copy()
locations_b = df_zip.copy()
# BallTree nearest neighbor distance
ball = BallTree(locations_a[["LAT_rad", "LON_rad"]].values, metric='haversine')
distances, indices = ball.query(locations_b[['LAT_rad', 'LON_rad']].values, k=1)
distances = distances * 3958.8
# get distances into a df, concatenate
df_temp = pd.DataFrame(data=distances, index=locations_b['ZipCode'])
df_temp.rename(columns={0: 'Miles'}, inplace=True)
df_temp.reset_index(drop=False, inplace=True)
df_temp['scenario'] = scenario
df_dist = pd.concat([df_dist, df_temp], ignore_index=True)
df_dist.reset_index(drop=True, inplace=True)
# merge in map data
cols_zip = ['ZipCode', 'City', 'State', 'Population', 'PopOver65', 'Median_Income', 'Average_Income']
df_dist = pd.merge(df_dist, df_zip[cols_zip], how="left", on=["ZipCode"])
# cols_hos = ['ZipCode', 'Facility Name', 'ScenerioPotential', 'ScenerioCurrent']
# df_dist = pd.merge(df_dist, df_hos_final[cols_hos], how="left", on=["ZipCode"])
df_dist.to_csv(path2 + "ZipDist_Hos_Sklearn.csv", index=False, header=True, encoding='utf8')

More Related Content

Similar to Healthcare deserts: How accessible is US healthcare?

Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classificationSnehaDey21
 
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUESPREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUESIRJET Journal
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
HealthOrzo – Your Health Matters
HealthOrzo – Your Health MattersHealthOrzo – Your Health Matters
HealthOrzo – Your Health MattersIRJET Journal
 
A Review on Covid Detection using Cross Dataset Analysis
A Review on Covid Detection using Cross Dataset AnalysisA Review on Covid Detection using Cross Dataset Analysis
A Review on Covid Detection using Cross Dataset AnalysisIRJET Journal
 
module_2-_decision_maker.pptx_1.pdf
module_2-_decision_maker.pptx_1.pdfmodule_2-_decision_maker.pptx_1.pdf
module_2-_decision_maker.pptx_1.pdfabhay548125
 
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...IRJET Journal
 
AlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptxAlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptxPerumalPitchandi
 
Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsJieming Wei
 
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining Suman Chatterjee
 
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop ClusterIRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop ClusterIRJET Journal
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Seval Çapraz
 
Societal Impact of Applied Data Science on the Big Data Stack
Societal Impact of Applied Data Science on the Big Data StackSocietal Impact of Applied Data Science on the Big Data Stack
Societal Impact of Applied Data Science on the Big Data StackStealth Project
 
Heart Disease Prediction using Machine Learning Algorithms
Heart Disease Prediction using Machine Learning AlgorithmsHeart Disease Prediction using Machine Learning Algorithms
Heart Disease Prediction using Machine Learning AlgorithmsIRJET Journal
 

Similar to Healthcare deserts: How accessible is US healthcare? (20)

Heart disease classification
Heart disease classificationHeart disease classification
Heart disease classification
 
DSE-complete.pptx
DSE-complete.pptxDSE-complete.pptx
DSE-complete.pptx
 
Madhavi tippani
Madhavi tippaniMadhavi tippani
Madhavi tippani
 
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUESPREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
PREDICTION OF DIABETES (SUGAR) USING MACHINE LEARNING TECHNIQUES
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
HealthOrzo – Your Health Matters
HealthOrzo – Your Health MattersHealthOrzo – Your Health Matters
HealthOrzo – Your Health Matters
 
A Review on Covid Detection using Cross Dataset Analysis
A Review on Covid Detection using Cross Dataset AnalysisA Review on Covid Detection using Cross Dataset Analysis
A Review on Covid Detection using Cross Dataset Analysis
 
module_2-_decision_maker.pptx_1.pdf
module_2-_decision_maker.pptx_1.pdfmodule_2-_decision_maker.pptx_1.pdf
module_2-_decision_maker.pptx_1.pdf
 
1530 track2 humphrey
1530 track2 humphrey1530 track2 humphrey
1530 track2 humphrey
 
Data Mining.ppt
Data Mining.pptData Mining.ppt
Data Mining.ppt
 
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
 
AlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptxAlgorithmsModelsNov13.pptx
AlgorithmsModelsNov13.pptx
 
Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification Algorithms
 
presentationIDC - 14MAY2015
presentationIDC - 14MAY2015presentationIDC - 14MAY2015
presentationIDC - 14MAY2015
 
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining
 
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop ClusterIRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
IRJET- Analyse Big Data Electronic Health Records Database using Hadoop Cluster
 
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
CLIM Program: Remote Sensing Workshop, Foundations Session: A Discussion - Br...
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
 
Societal Impact of Applied Data Science on the Big Data Stack
Societal Impact of Applied Data Science on the Big Data StackSocietal Impact of Applied Data Science on the Big Data Stack
Societal Impact of Applied Data Science on the Big Data Stack
 
Heart Disease Prediction using Machine Learning Algorithms
Heart Disease Prediction using Machine Learning AlgorithmsHeart Disease Prediction using Machine Learning Algorithms
Heart Disease Prediction using Machine Learning Algorithms
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethSamantha Rae Coolbeth
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.pptRachmaGhifari
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证ju0dztxtn
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证a8om7o51
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 

Recently uploaded (20)

Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 

Healthcare deserts: How accessible is US healthcare?

  • 1. Healthcare deserts: How accessible is US healthcare? Andrew Kaszpurenko
  • 2. Agenda  Introduction: About me and what I do…  Data for Good: Mapping US Healthcare Desert  Data Sources  Methodologies – Brute Force  Visualization  Methodologies – Smarter way  Result and Summary 2
  • 3. Intro: Andrew Kaszpurenko  Manager of Advanced Analytics at Edwards Lifesciences THV Division for the last three years.  Lead a team that uses a variety of Machine Learning & AI methods to build models that inform leadership and business partners to help patients get treatment for Aortic Stenosis.  Before joining Edwards, Andrew has had over a decade of experience working in a variety of industries from Life Insurance, Health Insurance, Finance, and Direct Primary Care. 3
  • 4. Intro: Edwards Lifesciences Edwards Lifesciences  Edwards Lifesciences is the global leader in patient-focused medical innovations for structural heart disease, as well as critical care and surgical monitoring.  Driven by a passion to help patients, the company collaborates with the world’s leading clinicians and researchers to address unmet healthcare needs, working to improve patient outcomes and enhance lives.  Edwards Lifesciences’ headquarter is in Irvine, CA with 14,000 employees globally. 4 Transcatheter Heart Valve (THV) Division  A minimally invasive procedure to treat aortic stenosis is also called transcatheter aortic valve replacement (TAVR).  In the past, many people suffering from severe aortic stenosis had limited options to replace an unhealthy valve, such as open heart surgery. Since 2011, TAVR has opened a door of possibilities and options for treating people in the United States with severe aortic stenosis.
  • 5. Problem statement  Medical Background: Aortic valve stenosis that is related to increasing age and the buildup of calcium deposits on the aortic valve is most common in older people. It usually doesn't cause symptoms until ages 70 or 80.  Background: Medicare was looking to revise TAVR coverage decisions and some organizations were pushing for tighter requirements for which hospitals could perform the procedure based on volume of other procedures. 5  Problem Statement: How to show that based on different assumptions made about how far a patient was willing to travel for treatment, would a policy change have an impact on access?
  • 6. Desired End Goal  Map of the United States, color indicating where the TAVR deserts would be.  As granular as possible to really show the deserts.  Ability to adjust the distance a patient travels  Ability to have a few scenarios of hospitals included or not  Overlay income and poverty information 6
  • 7. Tools & Technical Challenge  Tableau: Business friendly way to visually display and interface with the data  Python: To clean & organize the data in a manner that makes it friendly for Tableau to work with. 7 Technical Challenge:  There are 30,000 zip codes in the US, but don’t want Tableau to have to do nearest point calculations every time the user changes the parameters.  A nearest point calc would be 450M data points having to be calculated every time the user changes the parameter (30,000 x 30,000)/2
  • 8. Input Tables Two tables needed:  List of Hospitals and their Zip Code – Plus the different scenarios we want to consider  A crosswalk of all the Zip Codes and their Lat, Long  Both are publicly available: – https://public.opendatasoft.com/explore/dataset/us-zip- code-latitude-and-longitude/table/ – https://data.medicare.gov/Hospital-Compare/Hospital- General-Information/xubh-q36u 8 List of Hospitals List of Zip Codes and Lat Long
  • 9. First Thoughts Initial Thoughts:  Create a table outside of Tableau  Color coded by how far the closest hospital is: 1. Take every zip and compare it’s distance to every other zip and calculate the distance. 2. Mask only the hospitals (zips) we are interested in. 3. Take the minimum distance for each zip in the table.  This means it’s (30k zips x 30k zips)/2 = 450M calcs. 9 Revise this:  Can only do just the hospitals.  30k zips x 800 hospital zips = 24M calcs
  • 10. Methodology – Brute Force 10 Steps:  Import and clean every hospital and every zip  Take 30k zip and give it 800 possible locations to go to and calculate the distance.  Take the minimum distance for every zip to all hospital pairs.
  • 11. Code I – Import the Data and Basic clean up 11  Import the Zip Code information as a DataFrame  Import the Hospitals information as a DataFrame – Some hospitals share the same zip code as another hospital, so no need to do the calc more than once – Merge in the Lat/Long into the Hospital DataFrame Process
  • 12. Code II – Create Master Dataframe and Calc Distance 12  Create a new DataFrame with All Hospitals Mapped to all possible Zip Codes. – Ie. Hospital 1 will have 30,000 points – There are 18M rows now Process  Now have a Data Frame with every hospital and all the zips and distance from it  Min distance along each zip  Run the function against each row and return the miles from it. – Uses geopy’s - geodesic function – On a i7-8650U it takes 1hr 25mins mins
  • 13.  Taking the clean output table and loading it into Tableau.  Making the map a dual axis map to get the hospital locations overlaid with the result  Using a simple color palate where anyone can tell what is good or bad. 13 Process Setup of Visual
  • 14.  Adding in Population statistics – Census  Income Metrics – IRS.gov 14 Process Adding Layers of Detail
  • 15. Methodologies – Faster Way 15 Divide into smaller problems:  Lat, Long are just two points on a surface.  Just search that region  Using scikit K-d Tree What’s taking so long:  Many repetitive calculations that are useless. – Want to avoid calculating if Irvine is closer to NYC than Los Angeles. Nearest Neighbor
  • 16. K-D Tree basics Divide into a smaller problem:  Lat, Long are just two points on a surface.  Divide into smaller problems  Using scikit K-D Tree 16 a b c d e f g h i j 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 K-D Tree z x ≥6 y ≥ 6 y ≥ 6 a,b,c,d,e I,f,h,j,g I,f,hj,ga,d,eb,c
  • 17. Ball Tree Divide into a smaller problem:  Because these points are on a sphere (low dimensional manifold) K-D won’t work  But can use Ball Tree library (scikit-learn) 17 https://towardsdatascience.com/using-scikit-learns-binary-trees-to-efficiently-find-latitude-and-longitude-neighbors-909979bd929b https://towardsdatascience.com/tree-algorithms-explained-ball-tree-algorithm-vs-kd-tree-vs-brute-force-9746debcd940 a b c d e f g h i j 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Ball Tree z  Conceptually similar to K-D Trees
  • 18. Code II – Add in Potential Scenarios 18  Using same main data frames as before, ZipCode and Hospital  Create Scenarios  Convert Lat and Long into Radians Process  Create Dataframes based on the scenario
  • 19. Code II – Ball Tree 19  Using same 2 main data frames as before – ZipCode – Hospital Process  Take the Array and put it into a DataFrame  Query the Ball tree against all zips (30k).  k=1, how many sites to return  1 radian = 3959 miles
  • 20. Code II – Final 20  Run for each scenario  Stack on top of itself for each scenario. – Tableau likes long and skinny data – 3 scenarios = 30k x 3 = 90k rows Process
  • 21. Go to Tableau 21 Go To Tableau
  • 22.  White space is your friend  Remove grid lines & default shading  Use a set palate and font size  Each thing should matter, if not remove  Recognize the difference between a presentation vs dashboard 22 Visual Best Practices
  • 23. Result and Summary  This analysis contributed to internal analysis to understand different proposed Medicare Guidelines.  Help to understand and quantify the geographic limitations to treatment accessibility. 23 Other applications:  Logistics and distribution – Food accessibility, supply chain, etc Edwards Lifesciences is looking for passionate data professionals, visit www.edwards.com
  • 24.
  • 25. Code 1 - Setup from sklearn.neighbors import BallTree, KDTree import numpy as np import pandas as pd from geopy.distance import geodesic def get_scenario_list(df, mask, target_col): # returns a target_col subset as a list based on condition defined in mask foo_list = df[mask][target_col].to_list() return foo_list # Import ZIP Code geo mapping file import_cols = ['ZipCode', 'Latitude', 'Longitude', 'ShowMap', 'City', 'State', 'Population', 'PopOver65', 'Median_Income', 'Average_Income'] df_zip = pd.read_excel(path + "ZipClean.xlsx", sheet_name="UniqueZip", dtype={"ZipCode": str}, usecols=import_cols) # ZIP code subset of useable zips df_zip.drop_duplicates(['ZipCode'], keep='first', inplace=True) df_zip.reset_index(drop=True, inplace=True) df_zip.rename(columns={'Latitude': 'LAT', 'Longitude': 'LON'}, inplace=True) df_zip = df_zip[np.isfinite(df_zip['LAT'])].reset_index(drop=True) # Import Hospital File df_hos_final = pd.read_excel(path2 + "HospitalFile.xlsx", sheet_name='Hospitals', dtype={"Facility Zip": str}) df_hos_final.rename(columns={"Facility Zip": 'ZipCode'}, inplace=True) # Merge Map information to hospital file to get Lat long for each hospital ZIP code df_hos_final = pd.merge(df_hos_final, df_zip[['ZipCode', 'LAT', 'LON']], how="left", on=["ZipCode"]) # drop duplicates df_hos = df_hos_final[['ZipCode', 'ScenerioCurrent', 'ScenerioPotential', 'LAT', 'LON']].copy() df_hos.drop_duplicates(['ZipCode'], keep='first', inplace=True) df_hos.reset_index(drop=True, inplace=True) df_hos = df_hos[np.isfinite(df_hos['LAT'])].reset_index(drop=True)
  • 26. Code 2 – Brute Force df_zip.rename(columns={'Lat':'Latz', 'Long':'Longz'}, inplace=True) df_hos.rename(columns={'Lat':'Lath', 'Long':'Longh'}, inplace=True) # Create a new DataFrame with every hospital to every possible zip available df_dist = pd.merge(df_hos.assign(key=0), df_zip.assign(key=0), on='key').drop('key', axis=1) df_dist.rename(columns={'ZipCode_x':'Zip_Hos', 'ZipCode_y':'Zip_Map'}, inplace=True) # Run Distance Calc df_dist['miles'] = df_dist.apply((lambda row: geodesic((row['Latz'], row['Longz']), (row['Lath'], row['Longh'])).miles), axis=1) df_dist.reset_index(drop=True, inplace=True)
  • 27. Code 3 – Nearest Neighbor (Ball Tree) scenario_cur_mask = df_hos['ScenerioCurrent'] == 1 scenario_pot_mask = df_hos['ScenerioPotential'] == 1 # Use dictionary to store all of the scenario lists. This will be iterated through below scenarios_dict = { 'cur': get_scenario_list(df_hos, scenario_cur_mask, 'ZipCode'), 'pot': get_scenario_list(df_hos, scenario_pot_mask, 'ZipCode') } # Creates new columns converting coordinate degrees to radians (both dfs) for df in [df_hos, df_zip]: for col in ['LAT', 'LON']: rad = np.deg2rad(df[col].values) df[f'{col}_rad'] = rad # loop through each scenario in scenarios_dict, output BallTree nearest neighbor distance to closest hospital for every U.S. ZIP code # add a flag for which scenario it is # output for each scenario will be same length as df_zip df_dist = pd.DataFrame() for scenario in scenarios_dict: # subset df_hos by each scenario zip_list = scenarios_dict[scenario] locations_a = df_hos[df_hos['ZipCode'].isin(zip_list)].copy() locations_b = df_zip.copy() # BallTree nearest neighbor distance ball = BallTree(locations_a[["LAT_rad", "LON_rad"]].values, metric='haversine') distances, indices = ball.query(locations_b[['LAT_rad', 'LON_rad']].values, k=1) distances = distances * 3958.8 # get distances into a df, concatenate df_temp = pd.DataFrame(data=distances, index=locations_b['ZipCode']) df_temp.rename(columns={0: 'Miles'}, inplace=True) df_temp.reset_index(drop=False, inplace=True) df_temp['scenario'] = scenario df_dist = pd.concat([df_dist, df_temp], ignore_index=True) df_dist.reset_index(drop=True, inplace=True) # merge in map data cols_zip = ['ZipCode', 'City', 'State', 'Population', 'PopOver65', 'Median_Income', 'Average_Income'] df_dist = pd.merge(df_dist, df_zip[cols_zip], how="left", on=["ZipCode"]) # cols_hos = ['ZipCode', 'Facility Name', 'ScenerioPotential', 'ScenerioCurrent'] # df_dist = pd.merge(df_dist, df_hos_final[cols_hos], how="left", on=["ZipCode"]) df_dist.to_csv(path2 + "ZipDist_Hos_Sklearn.csv", index=False, header=True, encoding='utf8')