SlideShare a Scribd company logo
OPIM 5604 | Team 3
OPIM 5604 - PREDICTIVE MODELING
Predicting Hospital Readmission Rates within 30 days for
Diabetic Patients
TEAM 3
Yashi Sarbhai
Piyush Bishnoi
Manu Shankar
Muhammad Sanan Akbar
Mounika Paladugu
OPIM 5604 | Team 3
Contents
1.0 Executive Summary: 3
2.0 Problem Statement: 3
3.0 Methodology 3
3.1 DATASET OVERVIEW 3
3.1.1 Attributes and Target Variable Table 4
3.2 Data Exploration Techniques 6
3.2.1 Data Cleaning 6
3.2.2 Dimension Reduction 6
3.2.3 Missing Value Detection 7
3.2.4 Outlier Detection and Treatment. 7
4.0 Modification 7
4.1 Recoding Categorical Values 7
4.2 Rare Event Sampling 8
5.0 Modeling 8
5.1 Nominal Logistic 9
5.2 Neural Networks 9
5.3 Decision Trees: 10
5.4 Boosted Tree: 10
5.5 BootStrap Forest 11
5.6 Naïve Bayes 11
6.0 Assess 12
6.1 Model Comparison: 12
6.2 Model Improvement 12
7.0 Results and Conclusion: 13
7.1 Business Value of the Model: 13
7.2 Conclusion 13
8.0 References 14
OPIM 5604 | Team 3
1.0 Executive Summary:
A patientisconsideredtobe ‘re-admitted’inHospital whowere admittedinthe hospitalandagain
needstobe admittedtoa hospital withthe same problemwithin30days.Numberof Hospital
readmissionsindicate inefficiencyof healthcare systemsandadditional costsforTreatment.Therefore,
Healthcare Marketsand GovernmentHealthcare Agenciesare using 30-daysreadmission asanindex
for qualityof treatmentprovidedandtoassesstheirperformance,qualitycontrol measure andtarget
for cost reduction.Identifyingwhoare potential patientsfor readmission willenable healthcare
providerstoimprove theirservice andperformanyadditional Investigationsif neededandpreferably
preventreadmissioninfuture.
National DiabeticStatisticsreportstatedthat9.3% of the population inthe United Stateshave diabetes
out of which28% are still undiagnosed.AccordingtoCurrentUSMedical Reportthere are approximately
0.1 milliondiabeticpatientsandreadmissiontreatmentforthemcostsaroundd 250$ million.For
Diabetesreadmissionrate within30days isfoundto be 13-25% whichisquite higherthanrate of
hospitalizedpatients(8-14%).
2.0 Problem Statement:
Hospital ReadmissionReductionProgramstartedbyAffordable Care Actwasstartedto improve the
qualityof medical statementsandtreatmentandreduce the spendingonreadmission.We are tryingto
Predictthe Readmissionof diabeticPatientswithin30daysfrom the givendataset.We cannotprevent
readmissiononwhole butModel developedandpredictionscanbe usedto reduce re- admissionif
necessary, measuresare takenandimplementedonit.Real time dataof 100061 patientsiscollected,it
has 50 parameterscoveringall medical detailsrelatedtopatients,diagnosis,hospitalandlabtestsetc.
Firstmajor taskis to identifyparameterswhichare directlycontributingtoreadmissionandderivingthe
trend.Collected datahas huge amountof missingvaluesandredundantinformation.The model
developedisexpectedtopredictreadmissionrate of diabeticpatientswithin30days withsignificant
accuracy. Studyperformeddescribescollection,datapreparation,dimensionreduction,models
deployedandtheiraccuracy,interestingobservationsandpatternsidentified.
3.0 Methodology
3.1 DATASET OVERVIEW
The datasethas beenextractedfromUCImachine learningrepositoryandrepresentdatafor10 years
(1999-2008) of clinical care at 130 US hospitalsandintegrateddeliverynetworksincludingnumerous
featuresrepresentingpatientandhospital outcomes.
The total numberof instancespresentinthe datasetare 101,766 and the total columnattributesare 50.
Target variable inthisdatais“Readmitted”columnwhichisclassifiedas“peoplereadmittedindays”
(<30, >30, NO)
Link: https://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008#
OPIM 5604 | Team 3
3.1.1 Attributes and Target Variable Table
List of features and their descriptions in the initial dataset.
Attribute Type Description and values % missing
Encounter ID Numeric Unique identifier of an encounter 0%
Patient number Numeric Unique identifier of a patient 0%
Race Nominal
Values: Caucasian, Asian, African American, Hispanic, and
other 2%
Gender Nominal Values: male, female, and unknown/invalid 0%
Age Nominal Grouped in 10-year intervals: 0, 10), 10, 20), …, 90, 100) 0%
Weight Numeric Weight in pounds. 97%
Admission type Nominal
Integer identifier corresponding to 9 distinct values, for
example, emergency, urgent, elective, newborn, and not
available 0%
Discharge
disposition Nominal
Integer identifier corresponding to 29 distinct values, for
example, discharged to home, expired, and not available 0%
Admission source Nominal
Integer identifier corresponding to 21 distinct values, for
example, physician referral,emergency room, and transfer
from a hospital 0%
Time in hospital Numeric Integer number of days between admission and discharge 0%
Payer code Nominal
Integer identifier corresponding to 23 distinct values, for
example, Blue Cross/Blue Shield, Medicare, and self-pay 52%
Medical specialty Nominal
Integer identifier of a specialty of the admitting physician,
corresponding to 84 distinct values, for example,
cardiology, internal medicine, family/general practice, and
surgeon 53%
Number of lab
procedures Numeric Number of lab tests performed during the encounter 0%
Number of
procedures Numeric
Number of procedures (other than lab tests) performed
during the encounter 0%
Number of
medications Numeric
Number of distinct generic names administered during the
encounter 0%
Number of
outpatient visits Numeric
Number of outpatient visits of the patient in the year
preceding the encounter 0%
Number of
emergency visits Numeric
Number of emergency visits of the patient in the year
preceding the encounter 0%
Number of
inpatient visits Numeric
Number of inpatient visits of the patient in the year
preceding the encounter 0%
Diagnosis 1 Nominal
The primary diagnosis (coded as first three digits of ICD9);
848 distinct values 0%
Diagnosis 2 Nominal
Secondary diagnosis (coded as first three digits of ICD9);
923 distinct values 0%
OPIM 5604 | Team 3
Diagnosis 3 Nominal
Additional secondary diagnosis (coded as first three digits
of ICD9); 954 distinct values 1%
Number of
diagnoses Numeric Number of diagnoses entered to the system 0%
Glucose serum test
result Nominal
Indicates the range of the result or if the test was not taken.
Values: “>200,” “>300,” “normal,” and “none” if not
measured 0%
A1c test result Nominal
Indicates the range of the result or if the test was not taken.
Values: “>8” if the result was greater than 8%, “>7” if the
result was greater than 7% but less than 8%, “normal” if the
result was less than 7%, and “none” if not measured. 0%
Change of
medications Nominal
Indicates if there was a change in diabetic medications
(either dosage or generic name). Values: “change” and “no
change” 0%
Diabetes
medications Nominal
Indicates if there was any diabetic medication prescribed.
Values: “yes” and “no” 0%
23 features for
medications Nominal
For the generic names: metformin, repaglinide, nateglinide,
chlorpropamide, glimepiride, acetohexamide, glipizide,
glyburide, tolbutamide, pioglitazone, rosiglitazone,
acarbose,miglitol, troglitazone, tolazamide, examide,
insulin, glyburide-metformin, glipizide-metformin,
glimepiride-pioglitazone, metformin-rosiglitazone, and
metformin-pioglitazone, the feature indicates whether the
drug was prescribed or there was a change in the dosage.
Values: “up” if the dosage was increased during the
encounter, “down” if the dosage was decreased,“steady” if
the dosage did not change, and “no” if the drug was not
prescribed 0%
Readmitted Nominal
Days to inpatient readmission. Values: “<30” if the patient
was readmitted in less than 30 days, “>30” if the patient
was readmitted in more than 30 days, and “No” for no
record of readmission. 0%
3.2 Data Exploration Techniques
The core objective of the DataExplorationtechnique isremovingthe redundantdatafromrowsand
columnsthatare lesssignificantinpredictingthe targetvariable.The belowstepswere followedin
exploringandprocessingdata:
3.2.1 Data Cleaning
A newvariable “NA”wascreatedandimputedforthe insignificantvaluesinthe instances
1. Admission_type_id ->Values5,6,8whichrepresentsNotavailable,Null andnotmappedare
convertedtoNA and 7 whichrepresentsTraumacenterisrecodedto5.
2. Discharge_disposition_id ->18,25,26 (Null,Notmapped,invalid) convertedtoNA.
OPIM 5604 | Team 3
3. Admission_source_id -> 9,15,17,20,21(Not Available, Null, not mapped, invalid)
converted to NA
3.2.2 Dimension Reduction
Dimensionreductiontechniquewasdeployedinthe explorationprocesstoreduce numberof variables.
The target was to identifyminimumnumberof relevantattributeswithnon- overlappinginformation.
The belowmeasureswere taken:
3.2.2.1 Column Removal.
The followingcolumnswere removedfromthe dataset.
S. No. Attribute Reason for Removal
1. Weight 98569 valueswere missingfromtotal of 101,766
rowswhichaccounts for96.85%
2. PayerCode Payercode signifiesthe mode of paymentfor
differentpatientsandisnotmuchsignificanttothe
problemstatement
3. Medical specialty Medical specialtyhasaround50% missingvalues.
Thus,needto be removed.
4. Diag1, Diag2,Diag3 Theyare nominal variable witharound1000 distinct
possible values.Theyare codesusedformedical
purposes.These columnsare removedsoasto
reduce the complexityof the model.(Tradeoff
betweencomplexityandaccuracy)
3.2.2.2 Derived Column
medical_procedures:Thiscolumnisa summationof Num_labprocedures,num_procedures,num
medication.Itrepresentsthe individual’sdependabilityorinteractionwiththe hospitals.
Previous_number_of_visits :Number_outpatient,number_emergencyandnumber_inpatient
convertedtoone a single attribute asnumber_of_visitswhichisthe sum of the mentioned three
columns.
Diabetes_medications :Metformintometformin-pioglitazone isconvertedtoa scale of 0 - no,1- yes.
Thenthese valuesare summeduptoreduce the numberof columnsfrom23 to 1.
OPIM 5604 | Team 3
3.2.2.3 Principal Component Analysis.
Principal Analysiswascomputedfor5attributes(Time inhospital,medical_procedure,previousvisits,
numberof diagnoses,diabetes_medication).4componentsoutof five were chosen.Thisdecisionwas
made keepinginmindthe complexityversusthe accuracyof the model.
As a resultof all the techniques,wereable toreduce attributesto18 from50 whichmeansthat
dimensionalitywasreducedbymore than50%
3.2.3 Missing Value Detection
Missingvalue detectionandtreatmentplayedanessentialrole indataprocessing.The below two
approacheswere followedtotreatthe missingvalues.
● All the missingvalueswerefirstidentifiedandimputedbyavariable called‘NA’
● Attributesthathada majorityof ‘NA’valueswere droppedfromthe dataset
3.2.4 Outlier Detection and Treatment.
As a part of outlierdetection,we identifiedthe attributescontainingoutliers.Asperthe business,they
were notthe outliersbutactual values.Hence,novalueswere removed.
For example,numberof visitswererangingfrom0 to 80 buttheirpossibilitycannotbe rejected.
4.0 Modification
4.1 Recoding Categorical Values
Usingthe “Recode”optioninJMP,categorical valuesof differentvalueswere reassignedtosuite the
businessneedsandresearchconducted.
S. No. Attributes RecodedValuesApplied
1 Age Accordingto variousstudies,age groupswere codedasbelow
● 0-40 Years � 1
● 40-70 years� 2
● Above 70 �3
2 Max_glu_serum Max Glu Serumsignifiesthe sugarlevels
● None� 0
● Normal� 1
● Abnormal (>200 & >300) �2
3 A1Cresult A1C ResultsBloodtestthatreflectsavgbloodglucose levelsover
past 3 months.The resultswere codedas:
OPIM 5604 | Team 3
● None� 0
● Normal� 1
● Abnormal (>7 & >8) �2
4 Medications All the diabetesMedication(withvaluesNo,steady,Up,down)
were dividedas0or 1
● Steady,Up,Down �1
● No � 0
4.2 Rare Event Sampling
● simple randomsamplingmayproduce toofew of the rare classto yielduseful informationabout
what distinguishesthemfromthe dominantclass.In such cases stratifiedsamplingisoftenused
to oversample cases from the rare class and improve the performance of classifiers.
● In our case, the proportion of target Variable as ’Yes’ were too rare to produce any accurate
results.Hence,stratifiedsamplingwasappliedtogainabalancedratioof ‘yes’ inthe datasample.
● As a result, the total numberof instanceswere reduced to 38525 with11357 rowshavingtarget
Variable “Readmitted” as “yes”
5.0 Modeling
Thisis supervisedlearningforclassification,afterdata wasprocessed,variousclassificationmodels
were testedtoidentifythe model withmaximumaccuracyandAUC, andminimummisclassification
rate.The belowmodelswere executedtostudythe performance.
5.1 Nominal Logistic
OPIM 5604 | Team 3
5.2 Neural Networks
5.3 Decision Trees:
5.4 Boosted Tree:
OPIM 5604 | Team 3
5.5 BootStrap Forest
OPIM 5604 | Team 3
5.6 Naïve Bayes
6.0 Assess
6.1 Model Comparison:
Model
False Negatives (People
who are readmitted but
predicted as No)
Accuracy of
the Model
AUC Misclassification
rate
Logistic
Regression
932 63.7% 0.6612 0.36
Neural
Networks
844 60.6% 0.6570 0.39
Decision
Trees
877 58.1% 0.6299 0.41
Boosted Tree 952 61.4% 0.6365 0.38
Bootstrap
Forest
895 60.9% 0.6515 0.39
Naïve Bayes 1098 65.5% 0.6412 0.35
OPIM 5604 | Team 3
Out of all Modelswe are prioritizingFalse negatives.We have low false negativesinNeural Networks
model,althoughtotal accuracyof this model islow comparedtoNaïve Bayesand Logisticandothers.
So,we choose thismodel foraccurate prediction.False negativesimplywrongpredictionof readmitted
as ‘NO’whichwill impactthe accuracy of predictionandprobablywe cannotprovide propertreatment
to themif our predictioniswrong.
6.2 Model Improvement
We are altering the Cutoff by trying different possibilities by reducing it to reduce false
negatives trading off between total accuracy and reduction of false negatives. Finally, we could
come down to 591 from 844 false negatives maintaining our total accuracy
OPIM 5604 | Team 3
7.0 Results and Conclusion:
Hospitalsshoulduse the Neural Networkmodel topredictwhetherthe patientneedstobe
readmittedwithin30days. The cutoff rate forusingthe Neural Networkmodel shouldbe keptat
0.45 as we are able to achieve the targetof minimizingFalse Negatives.Stratifiedsamplingwasused
because of the rarity of the Target Variable meaningthe accuracymightbe compromisedforthe
model
7.1 Business Value of the Model:
Our Projectdevelops a predictivetool to health serviceproviders to predictpatients with risk of readmission
within 30 days with an accuracy of 60.6%. From model profiler we can see that number of visits (inpatient) is
contributingto the risk of readmission.So,Hospitals should providemore careto those.
Model cost with cut off_0.45
Falsenegatives-591,Falsepositives-9471=(591*1000$+9471*200$)=$248,5200
OPIM 5604 | Team 3
8.0 References
https://care.diabetesjournals.org/
https://www.einstein.yu.edu/
www.idf.org/millions-unite-diabetes-awareness-world-diabetes-day-2010
https://www.kff.org/medicare/issue-brief/aiming-for-fewer-hospital-u-turns-the-medicare-
hospital-readmission-reduction-program/
https://yaledailynews.com/blog/2017/01/17/medicare-penalties-lead-to-decline-in-hospital-
readmission-rates/
OPIM 5604 | Team 3

More Related Content

Similar to Predictive Modeling: White Paper

1Big Data Analytics forHealthcareChandan K. ReddyD.docx
1Big Data Analytics forHealthcareChandan K. ReddyD.docx1Big Data Analytics forHealthcareChandan K. ReddyD.docx
1Big Data Analytics forHealthcareChandan K. ReddyD.docx
aulasnilda
 
Data mining for diabetes readmission final
Data mining for diabetes readmission finalData mining for diabetes readmission final
Data mining for diabetes readmission final
Xiayu (Carol) Zeng
 
Predictive Medicine
Predictive Medicine Predictive Medicine
Predictive Medicine
Khuloud Edwards
 
2010 smg training_cardiff_day2_session4_sterne
2010 smg training_cardiff_day2_session4_sterne2010 smg training_cardiff_day2_session4_sterne
2010 smg training_cardiff_day2_session4_sterne
rgveroniki
 
poster_research
poster_researchposter_research
poster_research
Fem Ozcan
 
Business Analytics with R - Using Data Mining Techniques
Business Analytics with R - Using Data Mining TechniquesBusiness Analytics with R - Using Data Mining Techniques
Business Analytics with R - Using Data Mining Techniques
Anvitha Ananth
 
Sample Size: A couple more hints to handle it right using SAS and R
Sample Size: A couple more hints to handle it right using SAS and RSample Size: A couple more hints to handle it right using SAS and R
Sample Size: A couple more hints to handle it right using SAS and R
Dave Vanz
 
ICU Patient Deterioration Prediction : A Data-Mining Approach
ICU Patient Deterioration Prediction : A Data-Mining ApproachICU Patient Deterioration Prediction : A Data-Mining Approach
ICU Patient Deterioration Prediction : A Data-Mining Approach
csandit
 
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACH
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACHICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACH
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACH
cscpconf
 
Critical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxCritical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptx
Mrs S Sen
 
Decision Support System to Evaluate Patient Readmission Risk
Decision Support System to Evaluate Patient Readmission RiskDecision Support System to Evaluate Patient Readmission Risk
Decision Support System to Evaluate Patient Readmission Risk
Avishek Choudhury
 
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
IJDKP
 
AN IMPROVED MODEL FOR CLINICAL DECISION SUPPORT SYSTEM
AN IMPROVED MODEL FOR CLINICAL DECISION SUPPORT SYSTEMAN IMPROVED MODEL FOR CLINICAL DECISION SUPPORT SYSTEM
AN IMPROVED MODEL FOR CLINICAL DECISION SUPPORT SYSTEM
ijaia
 
1.ppt
1.ppt1.ppt
1.ppt
urabs
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
Sivagowry Shathesh
 
CHIME College Live - Anatomy of a Measure
CHIME College Live - Anatomy of a MeasureCHIME College Live - Anatomy of a Measure
CHIME College Live - Anatomy of a Measure
Jason Oliveira
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
IJDKP
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
IJDKP
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
IJDKP
 
David Madigan MedicReS World Congress 2014
David Madigan MedicReS World Congress 2014David Madigan MedicReS World Congress 2014
David Madigan MedicReS World Congress 2014
MedicReS
 

Similar to Predictive Modeling: White Paper (20)

1Big Data Analytics forHealthcareChandan K. ReddyD.docx
1Big Data Analytics forHealthcareChandan K. ReddyD.docx1Big Data Analytics forHealthcareChandan K. ReddyD.docx
1Big Data Analytics forHealthcareChandan K. ReddyD.docx
 
Data mining for diabetes readmission final
Data mining for diabetes readmission finalData mining for diabetes readmission final
Data mining for diabetes readmission final
 
Predictive Medicine
Predictive Medicine Predictive Medicine
Predictive Medicine
 
2010 smg training_cardiff_day2_session4_sterne
2010 smg training_cardiff_day2_session4_sterne2010 smg training_cardiff_day2_session4_sterne
2010 smg training_cardiff_day2_session4_sterne
 
poster_research
poster_researchposter_research
poster_research
 
Business Analytics with R - Using Data Mining Techniques
Business Analytics with R - Using Data Mining TechniquesBusiness Analytics with R - Using Data Mining Techniques
Business Analytics with R - Using Data Mining Techniques
 
Sample Size: A couple more hints to handle it right using SAS and R
Sample Size: A couple more hints to handle it right using SAS and RSample Size: A couple more hints to handle it right using SAS and R
Sample Size: A couple more hints to handle it right using SAS and R
 
ICU Patient Deterioration Prediction : A Data-Mining Approach
ICU Patient Deterioration Prediction : A Data-Mining ApproachICU Patient Deterioration Prediction : A Data-Mining Approach
ICU Patient Deterioration Prediction : A Data-Mining Approach
 
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACH
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACHICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACH
ICU PATIENT DETERIORATION PREDICTION: A DATA-MINING APPROACH
 
Critical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptxCritical Appraisal - Quantitative SS.pptx
Critical Appraisal - Quantitative SS.pptx
 
Decision Support System to Evaluate Patient Readmission Risk
Decision Support System to Evaluate Patient Readmission RiskDecision Support System to Evaluate Patient Readmission Risk
Decision Support System to Evaluate Patient Readmission Risk
 
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
EXAMINING THE EFFECT OF FEATURE SELECTION ON IMPROVING PATIENT DETERIORATION ...
 
AN IMPROVED MODEL FOR CLINICAL DECISION SUPPORT SYSTEM
AN IMPROVED MODEL FOR CLINICAL DECISION SUPPORT SYSTEMAN IMPROVED MODEL FOR CLINICAL DECISION SUPPORT SYSTEM
AN IMPROVED MODEL FOR CLINICAL DECISION SUPPORT SYSTEM
 
1.ppt
1.ppt1.ppt
1.ppt
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
 
CHIME College Live - Anatomy of a Measure
CHIME College Live - Anatomy of a MeasureCHIME College Live - Anatomy of a Measure
CHIME College Live - Anatomy of a Measure
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
THE APPLICATION OF EXTENSIVE FEATURE EXTRACTION AS A COST STRATEGY IN CLINICA...
 
David Madigan MedicReS World Congress 2014
David Madigan MedicReS World Congress 2014David Madigan MedicReS World Congress 2014
David Madigan MedicReS World Congress 2014
 

Recently uploaded

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 

Recently uploaded (20)

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 

Predictive Modeling: White Paper

  • 1. OPIM 5604 | Team 3 OPIM 5604 - PREDICTIVE MODELING Predicting Hospital Readmission Rates within 30 days for Diabetic Patients TEAM 3 Yashi Sarbhai Piyush Bishnoi Manu Shankar Muhammad Sanan Akbar Mounika Paladugu
  • 2. OPIM 5604 | Team 3 Contents 1.0 Executive Summary: 3 2.0 Problem Statement: 3 3.0 Methodology 3 3.1 DATASET OVERVIEW 3 3.1.1 Attributes and Target Variable Table 4 3.2 Data Exploration Techniques 6 3.2.1 Data Cleaning 6 3.2.2 Dimension Reduction 6 3.2.3 Missing Value Detection 7 3.2.4 Outlier Detection and Treatment. 7 4.0 Modification 7 4.1 Recoding Categorical Values 7 4.2 Rare Event Sampling 8 5.0 Modeling 8 5.1 Nominal Logistic 9 5.2 Neural Networks 9 5.3 Decision Trees: 10 5.4 Boosted Tree: 10 5.5 BootStrap Forest 11 5.6 Naïve Bayes 11 6.0 Assess 12 6.1 Model Comparison: 12 6.2 Model Improvement 12 7.0 Results and Conclusion: 13 7.1 Business Value of the Model: 13 7.2 Conclusion 13 8.0 References 14
  • 3. OPIM 5604 | Team 3 1.0 Executive Summary: A patientisconsideredtobe ‘re-admitted’inHospital whowere admittedinthe hospitalandagain needstobe admittedtoa hospital withthe same problemwithin30days.Numberof Hospital readmissionsindicate inefficiencyof healthcare systemsandadditional costsforTreatment.Therefore, Healthcare Marketsand GovernmentHealthcare Agenciesare using 30-daysreadmission asanindex for qualityof treatmentprovidedandtoassesstheirperformance,qualitycontrol measure andtarget for cost reduction.Identifyingwhoare potential patientsfor readmission willenable healthcare providerstoimprove theirservice andperformanyadditional Investigationsif neededandpreferably preventreadmissioninfuture. National DiabeticStatisticsreportstatedthat9.3% of the population inthe United Stateshave diabetes out of which28% are still undiagnosed.AccordingtoCurrentUSMedical Reportthere are approximately 0.1 milliondiabeticpatientsandreadmissiontreatmentforthemcostsaroundd 250$ million.For Diabetesreadmissionrate within30days isfoundto be 13-25% whichisquite higherthanrate of hospitalizedpatients(8-14%). 2.0 Problem Statement: Hospital ReadmissionReductionProgramstartedbyAffordable Care Actwasstartedto improve the qualityof medical statementsandtreatmentandreduce the spendingonreadmission.We are tryingto Predictthe Readmissionof diabeticPatientswithin30daysfrom the givendataset.We cannotprevent readmissiononwhole butModel developedandpredictionscanbe usedto reduce re- admissionif necessary, measuresare takenandimplementedonit.Real time dataof 100061 patientsiscollected,it has 50 parameterscoveringall medical detailsrelatedtopatients,diagnosis,hospitalandlabtestsetc. Firstmajor taskis to identifyparameterswhichare directlycontributingtoreadmissionandderivingthe trend.Collected datahas huge amountof missingvaluesandredundantinformation.The model developedisexpectedtopredictreadmissionrate of diabeticpatientswithin30days withsignificant accuracy. Studyperformeddescribescollection,datapreparation,dimensionreduction,models deployedandtheiraccuracy,interestingobservationsandpatternsidentified. 3.0 Methodology 3.1 DATASET OVERVIEW The datasethas beenextractedfromUCImachine learningrepositoryandrepresentdatafor10 years (1999-2008) of clinical care at 130 US hospitalsandintegrateddeliverynetworksincludingnumerous featuresrepresentingpatientandhospital outcomes. The total numberof instancespresentinthe datasetare 101,766 and the total columnattributesare 50. Target variable inthisdatais“Readmitted”columnwhichisclassifiedas“peoplereadmittedindays” (<30, >30, NO) Link: https://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008#
  • 4. OPIM 5604 | Team 3 3.1.1 Attributes and Target Variable Table List of features and their descriptions in the initial dataset. Attribute Type Description and values % missing Encounter ID Numeric Unique identifier of an encounter 0% Patient number Numeric Unique identifier of a patient 0% Race Nominal Values: Caucasian, Asian, African American, Hispanic, and other 2% Gender Nominal Values: male, female, and unknown/invalid 0% Age Nominal Grouped in 10-year intervals: 0, 10), 10, 20), …, 90, 100) 0% Weight Numeric Weight in pounds. 97% Admission type Nominal Integer identifier corresponding to 9 distinct values, for example, emergency, urgent, elective, newborn, and not available 0% Discharge disposition Nominal Integer identifier corresponding to 29 distinct values, for example, discharged to home, expired, and not available 0% Admission source Nominal Integer identifier corresponding to 21 distinct values, for example, physician referral,emergency room, and transfer from a hospital 0% Time in hospital Numeric Integer number of days between admission and discharge 0% Payer code Nominal Integer identifier corresponding to 23 distinct values, for example, Blue Cross/Blue Shield, Medicare, and self-pay 52% Medical specialty Nominal Integer identifier of a specialty of the admitting physician, corresponding to 84 distinct values, for example, cardiology, internal medicine, family/general practice, and surgeon 53% Number of lab procedures Numeric Number of lab tests performed during the encounter 0% Number of procedures Numeric Number of procedures (other than lab tests) performed during the encounter 0% Number of medications Numeric Number of distinct generic names administered during the encounter 0% Number of outpatient visits Numeric Number of outpatient visits of the patient in the year preceding the encounter 0% Number of emergency visits Numeric Number of emergency visits of the patient in the year preceding the encounter 0% Number of inpatient visits Numeric Number of inpatient visits of the patient in the year preceding the encounter 0% Diagnosis 1 Nominal The primary diagnosis (coded as first three digits of ICD9); 848 distinct values 0% Diagnosis 2 Nominal Secondary diagnosis (coded as first three digits of ICD9); 923 distinct values 0%
  • 5. OPIM 5604 | Team 3 Diagnosis 3 Nominal Additional secondary diagnosis (coded as first three digits of ICD9); 954 distinct values 1% Number of diagnoses Numeric Number of diagnoses entered to the system 0% Glucose serum test result Nominal Indicates the range of the result or if the test was not taken. Values: “>200,” “>300,” “normal,” and “none” if not measured 0% A1c test result Nominal Indicates the range of the result or if the test was not taken. Values: “>8” if the result was greater than 8%, “>7” if the result was greater than 7% but less than 8%, “normal” if the result was less than 7%, and “none” if not measured. 0% Change of medications Nominal Indicates if there was a change in diabetic medications (either dosage or generic name). Values: “change” and “no change” 0% Diabetes medications Nominal Indicates if there was any diabetic medication prescribed. Values: “yes” and “no” 0% 23 features for medications Nominal For the generic names: metformin, repaglinide, nateglinide, chlorpropamide, glimepiride, acetohexamide, glipizide, glyburide, tolbutamide, pioglitazone, rosiglitazone, acarbose,miglitol, troglitazone, tolazamide, examide, insulin, glyburide-metformin, glipizide-metformin, glimepiride-pioglitazone, metformin-rosiglitazone, and metformin-pioglitazone, the feature indicates whether the drug was prescribed or there was a change in the dosage. Values: “up” if the dosage was increased during the encounter, “down” if the dosage was decreased,“steady” if the dosage did not change, and “no” if the drug was not prescribed 0% Readmitted Nominal Days to inpatient readmission. Values: “<30” if the patient was readmitted in less than 30 days, “>30” if the patient was readmitted in more than 30 days, and “No” for no record of readmission. 0% 3.2 Data Exploration Techniques The core objective of the DataExplorationtechnique isremovingthe redundantdatafromrowsand columnsthatare lesssignificantinpredictingthe targetvariable.The belowstepswere followedin exploringandprocessingdata: 3.2.1 Data Cleaning A newvariable “NA”wascreatedandimputedforthe insignificantvaluesinthe instances 1. Admission_type_id ->Values5,6,8whichrepresentsNotavailable,Null andnotmappedare convertedtoNA and 7 whichrepresentsTraumacenterisrecodedto5. 2. Discharge_disposition_id ->18,25,26 (Null,Notmapped,invalid) convertedtoNA.
  • 6. OPIM 5604 | Team 3 3. Admission_source_id -> 9,15,17,20,21(Not Available, Null, not mapped, invalid) converted to NA 3.2.2 Dimension Reduction Dimensionreductiontechniquewasdeployedinthe explorationprocesstoreduce numberof variables. The target was to identifyminimumnumberof relevantattributeswithnon- overlappinginformation. The belowmeasureswere taken: 3.2.2.1 Column Removal. The followingcolumnswere removedfromthe dataset. S. No. Attribute Reason for Removal 1. Weight 98569 valueswere missingfromtotal of 101,766 rowswhichaccounts for96.85% 2. PayerCode Payercode signifiesthe mode of paymentfor differentpatientsandisnotmuchsignificanttothe problemstatement 3. Medical specialty Medical specialtyhasaround50% missingvalues. Thus,needto be removed. 4. Diag1, Diag2,Diag3 Theyare nominal variable witharound1000 distinct possible values.Theyare codesusedformedical purposes.These columnsare removedsoasto reduce the complexityof the model.(Tradeoff betweencomplexityandaccuracy) 3.2.2.2 Derived Column medical_procedures:Thiscolumnisa summationof Num_labprocedures,num_procedures,num medication.Itrepresentsthe individual’sdependabilityorinteractionwiththe hospitals. Previous_number_of_visits :Number_outpatient,number_emergencyandnumber_inpatient convertedtoone a single attribute asnumber_of_visitswhichisthe sum of the mentioned three columns. Diabetes_medications :Metformintometformin-pioglitazone isconvertedtoa scale of 0 - no,1- yes. Thenthese valuesare summeduptoreduce the numberof columnsfrom23 to 1.
  • 7. OPIM 5604 | Team 3 3.2.2.3 Principal Component Analysis. Principal Analysiswascomputedfor5attributes(Time inhospital,medical_procedure,previousvisits, numberof diagnoses,diabetes_medication).4componentsoutof five were chosen.Thisdecisionwas made keepinginmindthe complexityversusthe accuracyof the model. As a resultof all the techniques,wereable toreduce attributesto18 from50 whichmeansthat dimensionalitywasreducedbymore than50% 3.2.3 Missing Value Detection Missingvalue detectionandtreatmentplayedanessentialrole indataprocessing.The below two approacheswere followedtotreatthe missingvalues. ● All the missingvalueswerefirstidentifiedandimputedbyavariable called‘NA’ ● Attributesthathada majorityof ‘NA’valueswere droppedfromthe dataset 3.2.4 Outlier Detection and Treatment. As a part of outlierdetection,we identifiedthe attributescontainingoutliers.Asperthe business,they were notthe outliersbutactual values.Hence,novalueswere removed. For example,numberof visitswererangingfrom0 to 80 buttheirpossibilitycannotbe rejected. 4.0 Modification 4.1 Recoding Categorical Values Usingthe “Recode”optioninJMP,categorical valuesof differentvalueswere reassignedtosuite the businessneedsandresearchconducted. S. No. Attributes RecodedValuesApplied 1 Age Accordingto variousstudies,age groupswere codedasbelow ● 0-40 Years � 1 ● 40-70 years� 2 ● Above 70 �3 2 Max_glu_serum Max Glu Serumsignifiesthe sugarlevels ● None� 0 ● Normal� 1 ● Abnormal (>200 & >300) �2 3 A1Cresult A1C ResultsBloodtestthatreflectsavgbloodglucose levelsover past 3 months.The resultswere codedas:
  • 8. OPIM 5604 | Team 3 ● None� 0 ● Normal� 1 ● Abnormal (>7 & >8) �2 4 Medications All the diabetesMedication(withvaluesNo,steady,Up,down) were dividedas0or 1 ● Steady,Up,Down �1 ● No � 0 4.2 Rare Event Sampling ● simple randomsamplingmayproduce toofew of the rare classto yielduseful informationabout what distinguishesthemfromthe dominantclass.In such cases stratifiedsamplingisoftenused to oversample cases from the rare class and improve the performance of classifiers. ● In our case, the proportion of target Variable as ’Yes’ were too rare to produce any accurate results.Hence,stratifiedsamplingwasappliedtogainabalancedratioof ‘yes’ inthe datasample. ● As a result, the total numberof instanceswere reduced to 38525 with11357 rowshavingtarget Variable “Readmitted” as “yes” 5.0 Modeling Thisis supervisedlearningforclassification,afterdata wasprocessed,variousclassificationmodels were testedtoidentifythe model withmaximumaccuracyandAUC, andminimummisclassification rate.The belowmodelswere executedtostudythe performance. 5.1 Nominal Logistic
  • 9. OPIM 5604 | Team 3 5.2 Neural Networks 5.3 Decision Trees: 5.4 Boosted Tree:
  • 10. OPIM 5604 | Team 3 5.5 BootStrap Forest
  • 11. OPIM 5604 | Team 3 5.6 Naïve Bayes 6.0 Assess 6.1 Model Comparison: Model False Negatives (People who are readmitted but predicted as No) Accuracy of the Model AUC Misclassification rate Logistic Regression 932 63.7% 0.6612 0.36 Neural Networks 844 60.6% 0.6570 0.39 Decision Trees 877 58.1% 0.6299 0.41 Boosted Tree 952 61.4% 0.6365 0.38 Bootstrap Forest 895 60.9% 0.6515 0.39 Naïve Bayes 1098 65.5% 0.6412 0.35
  • 12. OPIM 5604 | Team 3 Out of all Modelswe are prioritizingFalse negatives.We have low false negativesinNeural Networks model,althoughtotal accuracyof this model islow comparedtoNaïve Bayesand Logisticandothers. So,we choose thismodel foraccurate prediction.False negativesimplywrongpredictionof readmitted as ‘NO’whichwill impactthe accuracy of predictionandprobablywe cannotprovide propertreatment to themif our predictioniswrong. 6.2 Model Improvement We are altering the Cutoff by trying different possibilities by reducing it to reduce false negatives trading off between total accuracy and reduction of false negatives. Finally, we could come down to 591 from 844 false negatives maintaining our total accuracy
  • 13. OPIM 5604 | Team 3 7.0 Results and Conclusion: Hospitalsshoulduse the Neural Networkmodel topredictwhetherthe patientneedstobe readmittedwithin30days. The cutoff rate forusingthe Neural Networkmodel shouldbe keptat 0.45 as we are able to achieve the targetof minimizingFalse Negatives.Stratifiedsamplingwasused because of the rarity of the Target Variable meaningthe accuracymightbe compromisedforthe model 7.1 Business Value of the Model: Our Projectdevelops a predictivetool to health serviceproviders to predictpatients with risk of readmission within 30 days with an accuracy of 60.6%. From model profiler we can see that number of visits (inpatient) is contributingto the risk of readmission.So,Hospitals should providemore careto those. Model cost with cut off_0.45 Falsenegatives-591,Falsepositives-9471=(591*1000$+9471*200$)=$248,5200
  • 14. OPIM 5604 | Team 3 8.0 References https://care.diabetesjournals.org/ https://www.einstein.yu.edu/ www.idf.org/millions-unite-diabetes-awareness-world-diabetes-day-2010 https://www.kff.org/medicare/issue-brief/aiming-for-fewer-hospital-u-turns-the-medicare- hospital-readmission-reduction-program/ https://yaledailynews.com/blog/2017/01/17/medicare-penalties-lead-to-decline-in-hospital- readmission-rates/
  • 15. OPIM 5604 | Team 3