Survival Analysis of stroke patients based on Different Insurance plans using SAS Enterprise
Guide™ and SAS Enterprise Miner™
Ajay Ganesan, Adithya Seshadri Sundararajan
MS in MIS, SAS and OSU Data Mining Certified, Oklahoma State University
ABSTRACT
There is an increasing trend observed in the number of stroke patients in the US. Also, the number of insured
individuals has dramatically gone up over the past few years. Insurance plans minimize financial burden by covering
most of the medical costs and will become indispensable over the years. This paper summarizes the analysis
performed on the effectiveness of different insurance plans used by stroke patients. By predicting patients survival
probability, High risk patients with similar characteristics on the historical data are charged appropriately by insurance
companies by strategically marketing their products and thereby concentrating on top hazard patients by reducing the
risk of paying insurance claims.
Using SAS Enterprise Guide and Enterprise Miner, this study determines the total charges incurred by the patients with
different insurance plans, based on the metrics such as length of stay, expiration status, age, marital status, gender, and
census region,. This analysis was performed with OSU’s Center for Systems Health Innovations (CHSI)’s approval which
has 26,181 US patients with cerebral artery occlusion (ICD 9 code :434.91) from 1999 to 2013.
METHODS
Data Analysis and Preparation
Cerner’s stroke data with cerebral artery occlusion is being used for this analysis, which has 26,181
records with 18 variables. This study considers stroke patients with the age of 65 and above, while comparing &
analyzing their insurance plans. Initially, there were many redundant variables or variables with descriptive
information, which were not useful for the analysis and were removed. Also, few records were found to have null
values associated to it. These records have been ignored from the analysis.
To calculate the length of the stay, the difference between the discharge date & time and admitted
date & time was calculated. Also, a new variable ‘DISC_Expired’ was created to know the status of the patients (expired
or alive). A new variable ‘Charge Category’ was created based on the total charges information to perform survival
analysis. We have used the Cross Industry Standard Procedure-DM methodology is being employed in this study. We
can see the methodology from the below figure.
Data Exploration
Based on the descriptive analysis, we can conclude the following:
• Champus insurance plan is having one of the lowest Total charges , but it has one of the highest expiration rates
• Self-insured/Self Pay group is spending a lot when it comes to the insurance ‘s total charges since they don’t have a
proper insurance plan and are not an associated partners with the hospital & medical centers like insurance
companies
• Though Medicare and Medicaid has one of the highest mean total charges, it has become one of the most risky
insurance plans because of the more expired patients when compared to others
• South has the highest percentage of people who expired (15.81%) followed by Midwest (14.45%). Northeast has the
least number of people who passed away (10.72%)
• Among private plans (BCBS/HMO/PPO), PPO has the least expiration rate when compared to BCBS and HMO though
the total charges incurred by this plan is the highest of all private plans.
• An insurance customer with marital status ‘Single’ has a higher chance of expiration rate (14.42%) due to strokes
Template provided by ePosterBoards LLC
Survival Analysis of stroke patients based on Different Insurance plans using SAS Enterprise
Guide™ and SAS Enterprise Miner™
MS in MIS, SAS and OSU Data Mining Certified, Oklahoma State University
Ajay Ganesan, Adithya Seshadri Sundararajan
RESULTS Recommendations
REFERENCES
• Self-insured/Self pay is quite expensive for the service they offer and percentage of patient expiration (15.68%)is
quite high. So insurance companies can devise a marketing strategy and lucrative offers to make people insured.
• CHAMPUS Insurance(Military insurance) is one of the cheapest options but there is a higher rate of expiration.
• When choosing a private plan, BCBS is economical and does not have risky expiration outcomes. But PPO is the
safest plan of all private plans with the least expiration rate
• Though Medicaid is one of the most economical insurance plan, the risky expiration outcomes along with a higher
length of stay outcome makes it a “not-so-attractive” option. Medicare has one of the least mean total charges and
has one of the most risky expiration outcomes
• People from the Midwest region have the most number of expired patients. So, insurance companies can come with
more attractive offers to address the need of these patients (Facilities to decrease mortality rates)
• Patients who stay at the hospital for up to 20 days need to be charged more since the risk of expiration is higher
• From the analysis, with the increase in age, the risk posed to insurance companies is more. So, companies have to
assign an age limit of 74 years for providing insurances, since after 75 years of age, expiration rate is spiking up
Below are the Listed references used in this study:
• Katherine Baicker, Sarah Taubman, Heidi Allen, Mira Bernstein, Jonathan Gruber, Joseph P. Newhouse, Eric
Schneider, Bill Wright, Alan Zaslavsky, Amy Finkelstein, and the Oregon Health Study Group, "The Oregon
Experiment – Effects of Medicaid on Clinical Outcomes", New England Journal of Medicine, 2013 May; 368(18):
1713-1722.
• http://www.rand.org/blog/2014/04/survey-estimates-net-gain-of-9-3-million-american-adults.html
Template provided by ePosterBoards LLC
Acknowledgement
• I Thank Dr. Goutam Chakraborty, Professor, Department of marketing and founder of OSU and SAS Data Mining
Certificate program- Oklahoma State University for his guidance and constant support throughout this study.
Survival Analysis
We are doing a survival analysis with 70% of the data used for training and 30% for validation. In the survival analysis
node, we are using two time ID variables (Admitted date of the patients & Discharged date of the patients) with time
interval of ‘Day’ and target variable as ‘Disc Expired’, which is a binary variable (where Expired=1, Expired=0)
• We can see from the following figures that the patients are subjected to maximum risk for the insurance companies
if they are staying up to 20 days due to the high risk value
• All variables (Age, total Charges, Payer code, Gender, marital status and Census region) except for race can be a
significant factor and they will have an impact on the patient’s expiration rate
• From the survival probability Histogram, we can see that there is a similar pattern between Validation data for
censoring date and validation date (for 30 days later). For both of these categories, almost 65% of the stroke
patient’s survival probability falls between 83% and 93%
• The model validation statistics (calculated based on the benefit metric), is approximately 80
FUTURE SCOPE
The Scope of this Study can be extended in the future by adding the re-admission rate of the patients and the data
about the patient’s medications which will help us to understand this study in a better way and will provide more
insights for data analysis.

Ganesan.Ajay

  • 1.
    Survival Analysis ofstroke patients based on Different Insurance plans using SAS Enterprise Guide™ and SAS Enterprise Miner™ Ajay Ganesan, Adithya Seshadri Sundararajan MS in MIS, SAS and OSU Data Mining Certified, Oklahoma State University ABSTRACT There is an increasing trend observed in the number of stroke patients in the US. Also, the number of insured individuals has dramatically gone up over the past few years. Insurance plans minimize financial burden by covering most of the medical costs and will become indispensable over the years. This paper summarizes the analysis performed on the effectiveness of different insurance plans used by stroke patients. By predicting patients survival probability, High risk patients with similar characteristics on the historical data are charged appropriately by insurance companies by strategically marketing their products and thereby concentrating on top hazard patients by reducing the risk of paying insurance claims. Using SAS Enterprise Guide and Enterprise Miner, this study determines the total charges incurred by the patients with different insurance plans, based on the metrics such as length of stay, expiration status, age, marital status, gender, and census region,. This analysis was performed with OSU’s Center for Systems Health Innovations (CHSI)’s approval which has 26,181 US patients with cerebral artery occlusion (ICD 9 code :434.91) from 1999 to 2013. METHODS Data Analysis and Preparation Cerner’s stroke data with cerebral artery occlusion is being used for this analysis, which has 26,181 records with 18 variables. This study considers stroke patients with the age of 65 and above, while comparing & analyzing their insurance plans. Initially, there were many redundant variables or variables with descriptive information, which were not useful for the analysis and were removed. Also, few records were found to have null values associated to it. These records have been ignored from the analysis. To calculate the length of the stay, the difference between the discharge date & time and admitted date & time was calculated. Also, a new variable ‘DISC_Expired’ was created to know the status of the patients (expired or alive). A new variable ‘Charge Category’ was created based on the total charges information to perform survival analysis. We have used the Cross Industry Standard Procedure-DM methodology is being employed in this study. We can see the methodology from the below figure. Data Exploration Based on the descriptive analysis, we can conclude the following: • Champus insurance plan is having one of the lowest Total charges , but it has one of the highest expiration rates • Self-insured/Self Pay group is spending a lot when it comes to the insurance ‘s total charges since they don’t have a proper insurance plan and are not an associated partners with the hospital & medical centers like insurance companies • Though Medicare and Medicaid has one of the highest mean total charges, it has become one of the most risky insurance plans because of the more expired patients when compared to others • South has the highest percentage of people who expired (15.81%) followed by Midwest (14.45%). Northeast has the least number of people who passed away (10.72%) • Among private plans (BCBS/HMO/PPO), PPO has the least expiration rate when compared to BCBS and HMO though the total charges incurred by this plan is the highest of all private plans. • An insurance customer with marital status ‘Single’ has a higher chance of expiration rate (14.42%) due to strokes Template provided by ePosterBoards LLC
  • 2.
    Survival Analysis ofstroke patients based on Different Insurance plans using SAS Enterprise Guide™ and SAS Enterprise Miner™ MS in MIS, SAS and OSU Data Mining Certified, Oklahoma State University Ajay Ganesan, Adithya Seshadri Sundararajan RESULTS Recommendations REFERENCES • Self-insured/Self pay is quite expensive for the service they offer and percentage of patient expiration (15.68%)is quite high. So insurance companies can devise a marketing strategy and lucrative offers to make people insured. • CHAMPUS Insurance(Military insurance) is one of the cheapest options but there is a higher rate of expiration. • When choosing a private plan, BCBS is economical and does not have risky expiration outcomes. But PPO is the safest plan of all private plans with the least expiration rate • Though Medicaid is one of the most economical insurance plan, the risky expiration outcomes along with a higher length of stay outcome makes it a “not-so-attractive” option. Medicare has one of the least mean total charges and has one of the most risky expiration outcomes • People from the Midwest region have the most number of expired patients. So, insurance companies can come with more attractive offers to address the need of these patients (Facilities to decrease mortality rates) • Patients who stay at the hospital for up to 20 days need to be charged more since the risk of expiration is higher • From the analysis, with the increase in age, the risk posed to insurance companies is more. So, companies have to assign an age limit of 74 years for providing insurances, since after 75 years of age, expiration rate is spiking up Below are the Listed references used in this study: • Katherine Baicker, Sarah Taubman, Heidi Allen, Mira Bernstein, Jonathan Gruber, Joseph P. Newhouse, Eric Schneider, Bill Wright, Alan Zaslavsky, Amy Finkelstein, and the Oregon Health Study Group, "The Oregon Experiment – Effects of Medicaid on Clinical Outcomes", New England Journal of Medicine, 2013 May; 368(18): 1713-1722. • http://www.rand.org/blog/2014/04/survey-estimates-net-gain-of-9-3-million-american-adults.html Template provided by ePosterBoards LLC Acknowledgement • I Thank Dr. Goutam Chakraborty, Professor, Department of marketing and founder of OSU and SAS Data Mining Certificate program- Oklahoma State University for his guidance and constant support throughout this study. Survival Analysis We are doing a survival analysis with 70% of the data used for training and 30% for validation. In the survival analysis node, we are using two time ID variables (Admitted date of the patients & Discharged date of the patients) with time interval of ‘Day’ and target variable as ‘Disc Expired’, which is a binary variable (where Expired=1, Expired=0) • We can see from the following figures that the patients are subjected to maximum risk for the insurance companies if they are staying up to 20 days due to the high risk value • All variables (Age, total Charges, Payer code, Gender, marital status and Census region) except for race can be a significant factor and they will have an impact on the patient’s expiration rate • From the survival probability Histogram, we can see that there is a similar pattern between Validation data for censoring date and validation date (for 30 days later). For both of these categories, almost 65% of the stroke patient’s survival probability falls between 83% and 93% • The model validation statistics (calculated based on the benefit metric), is approximately 80 FUTURE SCOPE The Scope of this Study can be extended in the future by adding the re-admission rate of the patients and the data about the patient’s medications which will help us to understand this study in a better way and will provide more insights for data analysis.