SlideShare a Scribd company logo
1 of 30
CIS-5210 HEALTHCARE DATA ANALYTICS
1
Diabetic Encounter Analysis
Monika Mishra
Sushant Burde
CIS 5210: Healthcare Data Analytics
Submitted to: Professor Shilpa Balan
CIS-5210 HEALTHCARE DATA ANALYTICS
2
Table of Contents
S. No. Topic Page No.
1 DATA SET
1. Data Set URL
2. About the dataset
3. Dataset details
4. Column details
3
3
4
4-5
2 DATA REFINEMENT
1. Removing duplicates
2. Removing unwanted column
3. Removing unwanted spaces
4. Converting Text to Columns
6
7
8
9
3 ANALYSIS & VISUALIZATIONS
1. Bar Chart
2. Box Plot
3. Line Chart
4. Pie Chart
5. Mosaic Plot
6. Bar-Line Chart
10-11
12-13
14-15
16-17
18-19
20-21
4 STATISTICAL SUMMARY 22-23
5 STATISTICAL TEST
1. One-Way Frequency
2. Correlation Analysis
3. T-Test
24-26
27
28-29
6 REFERENCES 30
CIS-5210 HEALTHCARE DATA ANALYTICS
3
DATA SET
1. Data Set URL:
https://www.kaggle.com/brandao/diabetes
2. About the dataset:
The data set represents 10 years (1999-2008) of clinical care at 130 US hospitals and
integrated delivery networks. It includes over 50 features representing patient and
hospital outcomes. Information was extracted from the database for encounters that
satisfied the following criteria.
 It is an inpatient encounter (a hospital admission).
 It is a diabetic encounter, that is, one during which any kind of diabetes was
entered to the system as a diagnosis.
 The length of stay was at least 1 day and at most 14 days.
 Laboratory tests were performed during the encounter.
 Medications were administered during the encounter.
The data contains such attributes as patient number, race, gender, age, admission type,
time in hospital, medical specialty of admitting physician, number of lab test performed,
HbA1c test result, diagnosis, number of medication, diabetic medications, number of
outpatient, inpatient, and emergency visits in the year before the hospitalization, etc.
3. Dataset details:
CIS-5210 HEALTHCARE DATA ANALYTICS
4
Original
File Size 19.2 MB
Number of columns 55
Number of rows 101767
File format CSV
Modified for the analysis
File Size 6 MB
Number of columns 15
Number of rows 68379
File format CSV
4. Column details:
The original dataset had 55 columns. For our analysis, we have reduced it to 15 columns.
The details of the columns are given below:
Column Name Column Detail
Encounter ID Unique identifier of an encounter
Patient number Unique identifier of a patient
Race Patient’s race
Gender Patient’s gender
Age Patient’s age group
CIS-5210 HEALTHCARE DATA ANALYTICS
5
Time in hospital Integer number of days between admission and
discharge
Medical specialty Treated by which department
Number of laboratories
procedures
Number of lab tests performed during the
encounter
Number of
procedures
Number of procedures (other than lab tests)
performed during the encounter
Number of
medications
Number of distinct generic names administered
during the encounter
Number of outpatients
visits
Number of outpatient visits of the patient in the
year preceding the encounter
Number of
emergency visits
Number of emergency visits of the patient in the
year preceding the encounter
Number of inpatients
visits
Number of inpatient visits of the patient in the
year preceding the encounter
Number of diagnoses Number of diagnoses entered to the system
Diabetes medications Indicates if there was any diabetic medication
prescribed
CIS-5210 HEALTHCARE DATA ANALYTICS
6
DATA REFINEMENT
Removing Duplicates
Before
After
Process
Explanation:
There were many duplicate rows present in the dataset. We used the “Remove Duplicates”
feature of the excel to remove duplicates. The “Remove Duplicate” feature can be found through
the path DataTable ToolsRemove Duplicates.
CIS-5210 HEALTHCARE DATA ANALYTICS
7
Removing Unwanted Columns
Before
After
Process
Explanation:
There were many columns which had just one value and were not required for visualizations. So,
I deleted those columns. One of those deleted columns is “max_glu_serum”. I selected the
column, right clicked on it and then clicked “Delete”.
CIS-5210 HEALTHCARE DATA ANALYTICS
8
Removing Unwanted Spaces
Before
After
Process
Explanation:
There were white spaces in between the words for the column medical specialty. I created a new
column and used formula builder to Use TRIM function on the medical specialty column. This
removed the white spaces between the words.
CIS-5210 HEALTHCARE DATA ANALYTICS
9
Converting text to columns
Before
After
Process
Explanation:
Two columns – race and gender were merged into one column. I used the “Convert Text to
Column wizard to separate the two details in two columns using comma as delimiter. The wizard
can be found through the path DataText to Columns.
CIS-5210 HEALTHCARE DATA ANALYTICS
10
ANALYSIS & VISUALIZATIONS
1. Which race had more diabetic encounter?
Chart used:
 Bar Chart
Analysis:
The above bar chart provides the diabetic encounter of various races. It can be seen that
the Caucasian race had the largest diabetic encounter of 51,042. It is followed by African
American race with 12,604 frequency. Hispanic race has a frequency of 1,372. The Asian
have the lowest diabetic encounter.
CIS-5210 HEALTHCARE DATA ANALYTICS
11
Full Screenshot:
CIS-5210 HEALTHCARE DATA ANALYTICS
12
2. What are the statistics of number of diagnoses?
Chart used:
 Box Plot
Analysis:
A box plot is a graphical rendition of statistical data based on the minimum, first quartile,
median, third quartile, and maximum. It shows the statistics for number of diagnoses. The
mean is about 7.6 and the median is 9. The first quartile value is 6 while the third quartile
value is 9. The minimum value is 1 while the maximum value is 16. These numbers are
based on the total observations of 68,379 for the variable number of diagnoses.
CIS-5210 HEALTHCARE DATA ANALYTICS
13
Full Screenshot:
CIS-5210 HEALTHCARE DATA ANALYTICS
14
3. Which age group has the highest inpatient encounter ?
Categories used:
 Line Chart
Analysis:
The above line chart shows the frequency of the age group of the inpatient encounter. The
highest inpatient encounter had been for the age group 70-80. The second age group with the
highest diabetic inpatient encounter is for the age group 60-70. The least inpatient encounter
is for the age group 0-10. In general, the encounter increases with increase of age group 70-
80. After that, a decline is observed.
CIS-5210 HEALTHCARE DATA ANALYTICS
15
Full Screenshot:
CIS-5210 HEALTHCARE DATA ANALYTICS
16
4. Which medical specialty was involved with the highest patient encounter?
Categories used:
 Pie Chart
Analysis:
Pie charts show the relative contribution of the parts to the whole. The size of a slice
represents the contribution of the data to the total chart statistic. The Internal Medicine
department had the highest encounter of the diabetic patient. The least have been encountered
by the Surgery-General department.
CIS-5210 HEALTHCARE DATA ANALYTICS
17
Full Screenshot:
CIS-5210 HEALTHCARE DATA ANALYTICS
18
5. Are there more females than males who take diabetic medicines?
Categories used:
 Mosaic Plot
Analysis:
Mosaic plots display tiles that correspond to the crosstabulation table cells. The areas of the
tiles are proportional to the frequencies of the table cells. Maximum males and females
admitted to the hospitals take diabetic medicines. The number females who take diabetic
medicines are lesser than the number of males.
CIS-5210 HEALTHCARE DATA ANALYTICS
19
Full Screenshot:
CIS-5210 HEALTHCARE DATA ANALYTICS
20
6. Which race accounts for maximum and minimum number of inpatient and
outpatient?
Categories used:
 Bar-Line Chart
Analysis:
The above chart displays number of outpatient and number of inpatient grouped by different
race. The Caucasian race tops in both the number of outpatients and number of inpatients.
The Asian race has the minimum value for both number of outpatient and number of
inpatients.
CIS-5210 HEALTHCARE DATA ANALYTICS
21
Full Screenshot:
CIS-5210 HEALTHCARE DATA ANALYTICS
22
STATISTICAL SUMMARY
Analysis:
Statistics Value Meaning
Mean 4.28
It is the average of the time spent in hospital. It is the
summation of all total time spent in hospital by total
number of observations (68379)
Std Dev
(Standard
Deviation)
2.92 It indicates the extent of deviation for the time spent in
hospital. In this case, it is closed to mean.
Minimum 1 The lowest value of the time spent in hospital
Maximum 14 The highest value of the time spent in hospital
Median 4
It represents the middle number in a given sequence of
numbers when it’s ordered by rank
N 68379
It is the total number of observations or total number of
rows in the table
CIS-5210 HEALTHCARE DATA ANALYTICS
23
We have taken the analysis variable as the time spent in hospital. The above table shows the
statistical summary with explanation.
Full Screenshot:
CIS-5210 HEALTHCARE DATA ANALYTICS
24
STATISTICAL TESTS
1. One – Way Frequency
CIS-5210 HEALTHCARE DATA ANALYTICS
25
Analysis:
For the one-way frequency test, we have taken gender as the analysis variable and number of
inpatients as frequency count. We want to know which gender had more inpatients
encounters.
From the table and the “Distribution of gender” graph, it can be seen the number of
inpatients for the female gender is higher than the male gender. The female gender has a
frequency count of 24, 985 which is 55.13% while that of male is 20, 339 which is 44.87%.
Cumulative frequency is defined as a running total of frequencies. The frequency of an
element in a set refers to how many of that element there are in the set. Cumulative frequency
can also be defined as the sum of all previous frequencies up to the current point.
CIS-5210 HEALTHCARE DATA ANALYTICS
26
The cumulative frequency is important when analyzing data, where the value of the
cumulative frequency indicates the number of elements in the data set that lie below the
current value.
The cumulative frequency adds up to total number of observations which in the above case is
45, 324. The cumulative percentage is always 100% for the last group which in my analysis
is for the Male gender. The “Cumulative Distribution of gender” graph displays the
cumulative frequency distribution.
Full Screenshot:
CIS-5210 HEALTHCARE DATA ANALYTICS
27
2. Correlation Analysis
Analysis:
The Correlation Analysis provides statistics for investigating associations among
variables. In the above case the correlation analysis is being performed for the
variables time_in_hospital and number_diagnoses, the value for which is 0.21469. It
means both the variables are weakly co-related. A value close to 1 signifies strong co-
relationship.
Full Screenshot:
CIS-5210 HEALTHCARE DATA ANALYTICS
28
3. T – Test
Analysis:
A T-test is a type of inferential statistic used to determine if there is a significant
difference between the means of two groups, which may be related in certain features.
A T-test is used as a hypothesis testing tool, which allows testing of an assumption
applicable to a population.
For my analysis, we have used one-sample t-test taking time_in_hospital as the
analysis variable. A one-sample T- test compares the mean of the sample to the null
hypothesis mean.
Using the Kolmogorov-Smirnov test value, since p<alpha (p<0.0100), there is
significant difference in the variable time_in_hospital.
In fact, using Cramer-von Mises test value and Anderson-Darling test value too, p
value is less than the corresponding alpha value (p<0.0050). And therefore, there is
significant difference in the variable time_in_hospital.
CIS-5210 HEALTHCARE DATA ANALYTICS
29
Full Screenshot:
CIS-5210 HEALTHCARE DATA ANALYTICS
30
REFERENCES
https://www.kaggle.com/brandao/diabetes
https://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages/cumulative_
frequency_percentiles_and_quartiles

More Related Content

What's hot

Android Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction SystemAndroid Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction System
ijtsrd
 
Linkage Detection of Features that Cause Stroke using Feyn Qlattice Machine L...
Linkage Detection of Features that Cause Stroke using Feyn Qlattice Machine L...Linkage Detection of Features that Cause Stroke using Feyn Qlattice Machine L...
Linkage Detection of Features that Cause Stroke using Feyn Qlattice Machine L...
PurwonoPurwono4
 

What's hot (10)

ECG
ECGECG
ECG
 
Android Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction SystemAndroid Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction System
 
Linkage Detection of Features that Cause Stroke using Feyn Qlattice Machine L...
Linkage Detection of Features that Cause Stroke using Feyn Qlattice Machine L...Linkage Detection of Features that Cause Stroke using Feyn Qlattice Machine L...
Linkage Detection of Features that Cause Stroke using Feyn Qlattice Machine L...
 
Automated diagnosis of hepatitis b using multilayer mamdani
Automated diagnosis of hepatitis b using multilayer mamdaniAutomated diagnosis of hepatitis b using multilayer mamdani
Automated diagnosis of hepatitis b using multilayer mamdani
 
Heart Disease Prediction using Machine Learning Algorithm
Heart Disease Prediction using Machine Learning AlgorithmHeart Disease Prediction using Machine Learning Algorithm
Heart Disease Prediction using Machine Learning Algorithm
 
Smart health disease prediction python django
Smart health disease prediction python djangoSmart health disease prediction python django
Smart health disease prediction python django
 
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...
IRJET - Effective Heart Disease Prediction using Distinct Machine Learning Te...
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
 
A Heart Disease Prediction Model using Decision Tree
A Heart Disease Prediction Model using Decision TreeA Heart Disease Prediction Model using Decision Tree
A Heart Disease Prediction Model using Decision Tree
 
Chronic Kidney Disease Prediction
Chronic Kidney Disease PredictionChronic Kidney Disease Prediction
Chronic Kidney Disease Prediction
 

Similar to Diabetic Encounter Analysis using SAS studio

Implementing Clinical Decision Support System Using Naïve Bayesian Classifier
Implementing Clinical Decision Support System Using Naïve Bayesian ClassifierImplementing Clinical Decision Support System Using Naïve Bayesian Classifier
Implementing Clinical Decision Support System Using Naïve Bayesian Classifier
rahulmonikasharma
 
Smart Health Disease Prediction django machinelearning.pptx
Smart Health Disease Prediction django machinelearning.pptxSmart Health Disease Prediction django machinelearning.pptx
Smart Health Disease Prediction django machinelearning.pptx
saiproject
 
In this programming assignment, you will be creating a Health Inform.pdf
In this programming assignment, you will be creating a Health Inform.pdfIn this programming assignment, you will be creating a Health Inform.pdf
In this programming assignment, you will be creating a Health Inform.pdf
sanjeevbansal1970
 
DIABETES MELLITUS WITH Applied Research in Healthcare Administration.docx
DIABETES MELLITUS WITH Applied Research in Healthcare Administration.docxDIABETES MELLITUS WITH Applied Research in Healthcare Administration.docx
DIABETES MELLITUS WITH Applied Research in Healthcare Administration.docx
stirlingvwriters
 
Robert Sutter Portfolio
Robert Sutter PortfolioRobert Sutter Portfolio
Robert Sutter Portfolio
Robert Sutter
 

Similar to Diabetic Encounter Analysis using SAS studio (20)

Implementing Clinical Decision Support System Using Naïve Bayesian Classifier
Implementing Clinical Decision Support System Using Naïve Bayesian ClassifierImplementing Clinical Decision Support System Using Naïve Bayesian Classifier
Implementing Clinical Decision Support System Using Naïve Bayesian Classifier
 
Dr Shahadat Uddin - University of Sydney
Dr Shahadat Uddin - University of SydneyDr Shahadat Uddin - University of Sydney
Dr Shahadat Uddin - University of Sydney
 
Shahadat Uddin
Shahadat UddinShahadat Uddin
Shahadat Uddin
 
eBook - Data Analytics in Healthcare
eBook - Data Analytics in HealthcareeBook - Data Analytics in Healthcare
eBook - Data Analytics in Healthcare
 
Data Science in Healthcare
Data Science in HealthcareData Science in Healthcare
Data Science in Healthcare
 
Smart Health Disease Prediction django machinelearning.pptx
Smart Health Disease Prediction django machinelearning.pptxSmart Health Disease Prediction django machinelearning.pptx
Smart Health Disease Prediction django machinelearning.pptx
 
Purple and white modern advertising presentation
Purple and white modern advertising presentationPurple and white modern advertising presentation
Purple and white modern advertising presentation
 
Data Quality Matters: EHR Data Quality, MACRA, and Improving Healthcare
Data Quality Matters: EHR Data Quality, MACRA, and Improving HealthcareData Quality Matters: EHR Data Quality, MACRA, and Improving Healthcare
Data Quality Matters: EHR Data Quality, MACRA, and Improving Healthcare
 
Icd 9-cm-2007
Icd 9-cm-2007Icd 9-cm-2007
Icd 9-cm-2007
 
Automated clinical documentation improvement
Automated clinical documentation improvementAutomated clinical documentation improvement
Automated clinical documentation improvement
 
Quality tools (2), Ola Elgaddar, 30 09 - 2013
Quality tools (2), Ola Elgaddar, 30   09 - 2013Quality tools (2), Ola Elgaddar, 30   09 - 2013
Quality tools (2), Ola Elgaddar, 30 09 - 2013
 
IRJET- Survey on Risk Estimation of Chronic Disease using Machine Learning
IRJET- Survey on Risk Estimation of Chronic Disease using Machine LearningIRJET- Survey on Risk Estimation of Chronic Disease using Machine Learning
IRJET- Survey on Risk Estimation of Chronic Disease using Machine Learning
 
DESIGN AND IMPLEMENTATION OF CARDIAC DISEASE USING NAIVE BAYES TECHNIQUE
DESIGN AND IMPLEMENTATION OF CARDIAC DISEASE USING NAIVE BAYES TECHNIQUEDESIGN AND IMPLEMENTATION OF CARDIAC DISEASE USING NAIVE BAYES TECHNIQUE
DESIGN AND IMPLEMENTATION OF CARDIAC DISEASE USING NAIVE BAYES TECHNIQUE
 
Machine learning and operations research to find diabetics at risk for readmi...
Machine learning and operations research to find diabetics at risk for readmi...Machine learning and operations research to find diabetics at risk for readmi...
Machine learning and operations research to find diabetics at risk for readmi...
 
IRJET- Analysis of Hospital Resources with Mortality Rates using Apriori ...
IRJET-  	  Analysis of Hospital Resources with Mortality Rates using Apriori ...IRJET-  	  Analysis of Hospital Resources with Mortality Rates using Apriori ...
IRJET- Analysis of Hospital Resources with Mortality Rates using Apriori ...
 
In this programming assignment, you will be creating a Health Inform.pdf
In this programming assignment, you will be creating a Health Inform.pdfIn this programming assignment, you will be creating a Health Inform.pdf
In this programming assignment, you will be creating a Health Inform.pdf
 
DIABETES MELLITUS WITH Applied Research in Healthcare Administration.docx
DIABETES MELLITUS WITH Applied Research in Healthcare Administration.docxDIABETES MELLITUS WITH Applied Research in Healthcare Administration.docx
DIABETES MELLITUS WITH Applied Research in Healthcare Administration.docx
 
subham(view2)-final-oops-PROJECT.pptx
subham(view2)-final-oops-PROJECT.pptxsubham(view2)-final-oops-PROJECT.pptx
subham(view2)-final-oops-PROJECT.pptx
 
Robert Sutter Portfolio
Robert Sutter PortfolioRobert Sutter Portfolio
Robert Sutter Portfolio
 
How to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - StatsworkHow to establish and evaluate clinical prediction models - Statswork
How to establish and evaluate clinical prediction models - Statswork
 

More from Monika Mishra

More from Monika Mishra (8)

Aws image recognition
Aws image recognitionAws image recognition
Aws image recognition
 
Drug Review Analysis Using Elasticsearch and Kibana
Drug Review Analysis Using Elasticsearch and KibanaDrug Review Analysis Using Elasticsearch and Kibana
Drug Review Analysis Using Elasticsearch and Kibana
 
An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...
An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...
An Empirical Study on Customer Consumption, Loyalty and Retention on a B2C E-...
 
Re-admit Historical using SAS Visual Analytics
Re-admit Historical  using SAS Visual AnalyticsRe-admit Historical  using SAS Visual Analytics
Re-admit Historical using SAS Visual Analytics
 
Superstore Data Analysis using R
Superstore Data Analysis using RSuperstore Data Analysis using R
Superstore Data Analysis using R
 
LA Energy and Water Efficiency Statistics using Tableau
LA Energy and Water Efficiency Statistics using TableauLA Energy and Water Efficiency Statistics using Tableau
LA Energy and Water Efficiency Statistics using Tableau
 
Predicting Amazon Rating Using Spark ML and Azure ML
Predicting Amazon Rating Using Spark ML and Azure MLPredicting Amazon Rating Using Spark ML and Azure ML
Predicting Amazon Rating Using Spark ML and Azure ML
 
Amazon Product Review Data Analysis
Amazon Product ReviewData AnalysisAmazon Product ReviewData Analysis
Amazon Product Review Data Analysis
 

Recently uploaded

obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
yulianti213969
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
aqpto5bt
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
ppy8zfkfm
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
saurabvyas476
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
dq9vz1isj
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
siskavia95
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
siskavia95
 

Recently uploaded (20)

obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
一比一原版(ucla文凭证书)加州大学洛杉矶分校毕业证学历认证官方成绩单
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444sourabh vyas1222222222222222222244444444
sourabh vyas1222222222222222222244444444
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di  Ban...
obat aborsi Banjarmasin wa 082135199655 jual obat aborsi cytotec asli di Ban...
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontangobat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di  Bontang
obat aborsi Bontang wa 082135199655 jual obat aborsi cytotec asli di Bontang
 
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI  MANAJEMEN OF PENYAKIT TETANUS.pptMATERI  MANAJEMEN OF PENYAKIT TETANUS.ppt
MATERI MANAJEMEN OF PENYAKIT TETANUS.ppt
 
The Significance of Transliteration Enhancing
The Significance of Transliteration EnhancingThe Significance of Transliteration Enhancing
The Significance of Transliteration Enhancing
 

Diabetic Encounter Analysis using SAS studio

  • 1. CIS-5210 HEALTHCARE DATA ANALYTICS 1 Diabetic Encounter Analysis Monika Mishra Sushant Burde CIS 5210: Healthcare Data Analytics Submitted to: Professor Shilpa Balan
  • 2. CIS-5210 HEALTHCARE DATA ANALYTICS 2 Table of Contents S. No. Topic Page No. 1 DATA SET 1. Data Set URL 2. About the dataset 3. Dataset details 4. Column details 3 3 4 4-5 2 DATA REFINEMENT 1. Removing duplicates 2. Removing unwanted column 3. Removing unwanted spaces 4. Converting Text to Columns 6 7 8 9 3 ANALYSIS & VISUALIZATIONS 1. Bar Chart 2. Box Plot 3. Line Chart 4. Pie Chart 5. Mosaic Plot 6. Bar-Line Chart 10-11 12-13 14-15 16-17 18-19 20-21 4 STATISTICAL SUMMARY 22-23 5 STATISTICAL TEST 1. One-Way Frequency 2. Correlation Analysis 3. T-Test 24-26 27 28-29 6 REFERENCES 30
  • 3. CIS-5210 HEALTHCARE DATA ANALYTICS 3 DATA SET 1. Data Set URL: https://www.kaggle.com/brandao/diabetes 2. About the dataset: The data set represents 10 years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes. Information was extracted from the database for encounters that satisfied the following criteria.  It is an inpatient encounter (a hospital admission).  It is a diabetic encounter, that is, one during which any kind of diabetes was entered to the system as a diagnosis.  The length of stay was at least 1 day and at most 14 days.  Laboratory tests were performed during the encounter.  Medications were administered during the encounter. The data contains such attributes as patient number, race, gender, age, admission type, time in hospital, medical specialty of admitting physician, number of lab test performed, HbA1c test result, diagnosis, number of medication, diabetic medications, number of outpatient, inpatient, and emergency visits in the year before the hospitalization, etc. 3. Dataset details:
  • 4. CIS-5210 HEALTHCARE DATA ANALYTICS 4 Original File Size 19.2 MB Number of columns 55 Number of rows 101767 File format CSV Modified for the analysis File Size 6 MB Number of columns 15 Number of rows 68379 File format CSV 4. Column details: The original dataset had 55 columns. For our analysis, we have reduced it to 15 columns. The details of the columns are given below: Column Name Column Detail Encounter ID Unique identifier of an encounter Patient number Unique identifier of a patient Race Patient’s race Gender Patient’s gender Age Patient’s age group
  • 5. CIS-5210 HEALTHCARE DATA ANALYTICS 5 Time in hospital Integer number of days between admission and discharge Medical specialty Treated by which department Number of laboratories procedures Number of lab tests performed during the encounter Number of procedures Number of procedures (other than lab tests) performed during the encounter Number of medications Number of distinct generic names administered during the encounter Number of outpatients visits Number of outpatient visits of the patient in the year preceding the encounter Number of emergency visits Number of emergency visits of the patient in the year preceding the encounter Number of inpatients visits Number of inpatient visits of the patient in the year preceding the encounter Number of diagnoses Number of diagnoses entered to the system Diabetes medications Indicates if there was any diabetic medication prescribed
  • 6. CIS-5210 HEALTHCARE DATA ANALYTICS 6 DATA REFINEMENT Removing Duplicates Before After Process Explanation: There were many duplicate rows present in the dataset. We used the “Remove Duplicates” feature of the excel to remove duplicates. The “Remove Duplicate” feature can be found through the path DataTable ToolsRemove Duplicates.
  • 7. CIS-5210 HEALTHCARE DATA ANALYTICS 7 Removing Unwanted Columns Before After Process Explanation: There were many columns which had just one value and were not required for visualizations. So, I deleted those columns. One of those deleted columns is “max_glu_serum”. I selected the column, right clicked on it and then clicked “Delete”.
  • 8. CIS-5210 HEALTHCARE DATA ANALYTICS 8 Removing Unwanted Spaces Before After Process Explanation: There were white spaces in between the words for the column medical specialty. I created a new column and used formula builder to Use TRIM function on the medical specialty column. This removed the white spaces between the words.
  • 9. CIS-5210 HEALTHCARE DATA ANALYTICS 9 Converting text to columns Before After Process Explanation: Two columns – race and gender were merged into one column. I used the “Convert Text to Column wizard to separate the two details in two columns using comma as delimiter. The wizard can be found through the path DataText to Columns.
  • 10. CIS-5210 HEALTHCARE DATA ANALYTICS 10 ANALYSIS & VISUALIZATIONS 1. Which race had more diabetic encounter? Chart used:  Bar Chart Analysis: The above bar chart provides the diabetic encounter of various races. It can be seen that the Caucasian race had the largest diabetic encounter of 51,042. It is followed by African American race with 12,604 frequency. Hispanic race has a frequency of 1,372. The Asian have the lowest diabetic encounter.
  • 11. CIS-5210 HEALTHCARE DATA ANALYTICS 11 Full Screenshot:
  • 12. CIS-5210 HEALTHCARE DATA ANALYTICS 12 2. What are the statistics of number of diagnoses? Chart used:  Box Plot Analysis: A box plot is a graphical rendition of statistical data based on the minimum, first quartile, median, third quartile, and maximum. It shows the statistics for number of diagnoses. The mean is about 7.6 and the median is 9. The first quartile value is 6 while the third quartile value is 9. The minimum value is 1 while the maximum value is 16. These numbers are based on the total observations of 68,379 for the variable number of diagnoses.
  • 13. CIS-5210 HEALTHCARE DATA ANALYTICS 13 Full Screenshot:
  • 14. CIS-5210 HEALTHCARE DATA ANALYTICS 14 3. Which age group has the highest inpatient encounter ? Categories used:  Line Chart Analysis: The above line chart shows the frequency of the age group of the inpatient encounter. The highest inpatient encounter had been for the age group 70-80. The second age group with the highest diabetic inpatient encounter is for the age group 60-70. The least inpatient encounter is for the age group 0-10. In general, the encounter increases with increase of age group 70- 80. After that, a decline is observed.
  • 15. CIS-5210 HEALTHCARE DATA ANALYTICS 15 Full Screenshot:
  • 16. CIS-5210 HEALTHCARE DATA ANALYTICS 16 4. Which medical specialty was involved with the highest patient encounter? Categories used:  Pie Chart Analysis: Pie charts show the relative contribution of the parts to the whole. The size of a slice represents the contribution of the data to the total chart statistic. The Internal Medicine department had the highest encounter of the diabetic patient. The least have been encountered by the Surgery-General department.
  • 17. CIS-5210 HEALTHCARE DATA ANALYTICS 17 Full Screenshot:
  • 18. CIS-5210 HEALTHCARE DATA ANALYTICS 18 5. Are there more females than males who take diabetic medicines? Categories used:  Mosaic Plot Analysis: Mosaic plots display tiles that correspond to the crosstabulation table cells. The areas of the tiles are proportional to the frequencies of the table cells. Maximum males and females admitted to the hospitals take diabetic medicines. The number females who take diabetic medicines are lesser than the number of males.
  • 19. CIS-5210 HEALTHCARE DATA ANALYTICS 19 Full Screenshot:
  • 20. CIS-5210 HEALTHCARE DATA ANALYTICS 20 6. Which race accounts for maximum and minimum number of inpatient and outpatient? Categories used:  Bar-Line Chart Analysis: The above chart displays number of outpatient and number of inpatient grouped by different race. The Caucasian race tops in both the number of outpatients and number of inpatients. The Asian race has the minimum value for both number of outpatient and number of inpatients.
  • 21. CIS-5210 HEALTHCARE DATA ANALYTICS 21 Full Screenshot:
  • 22. CIS-5210 HEALTHCARE DATA ANALYTICS 22 STATISTICAL SUMMARY Analysis: Statistics Value Meaning Mean 4.28 It is the average of the time spent in hospital. It is the summation of all total time spent in hospital by total number of observations (68379) Std Dev (Standard Deviation) 2.92 It indicates the extent of deviation for the time spent in hospital. In this case, it is closed to mean. Minimum 1 The lowest value of the time spent in hospital Maximum 14 The highest value of the time spent in hospital Median 4 It represents the middle number in a given sequence of numbers when it’s ordered by rank N 68379 It is the total number of observations or total number of rows in the table
  • 23. CIS-5210 HEALTHCARE DATA ANALYTICS 23 We have taken the analysis variable as the time spent in hospital. The above table shows the statistical summary with explanation. Full Screenshot:
  • 24. CIS-5210 HEALTHCARE DATA ANALYTICS 24 STATISTICAL TESTS 1. One – Way Frequency
  • 25. CIS-5210 HEALTHCARE DATA ANALYTICS 25 Analysis: For the one-way frequency test, we have taken gender as the analysis variable and number of inpatients as frequency count. We want to know which gender had more inpatients encounters. From the table and the “Distribution of gender” graph, it can be seen the number of inpatients for the female gender is higher than the male gender. The female gender has a frequency count of 24, 985 which is 55.13% while that of male is 20, 339 which is 44.87%. Cumulative frequency is defined as a running total of frequencies. The frequency of an element in a set refers to how many of that element there are in the set. Cumulative frequency can also be defined as the sum of all previous frequencies up to the current point.
  • 26. CIS-5210 HEALTHCARE DATA ANALYTICS 26 The cumulative frequency is important when analyzing data, where the value of the cumulative frequency indicates the number of elements in the data set that lie below the current value. The cumulative frequency adds up to total number of observations which in the above case is 45, 324. The cumulative percentage is always 100% for the last group which in my analysis is for the Male gender. The “Cumulative Distribution of gender” graph displays the cumulative frequency distribution. Full Screenshot:
  • 27. CIS-5210 HEALTHCARE DATA ANALYTICS 27 2. Correlation Analysis Analysis: The Correlation Analysis provides statistics for investigating associations among variables. In the above case the correlation analysis is being performed for the variables time_in_hospital and number_diagnoses, the value for which is 0.21469. It means both the variables are weakly co-related. A value close to 1 signifies strong co- relationship. Full Screenshot:
  • 28. CIS-5210 HEALTHCARE DATA ANALYTICS 28 3. T – Test Analysis: A T-test is a type of inferential statistic used to determine if there is a significant difference between the means of two groups, which may be related in certain features. A T-test is used as a hypothesis testing tool, which allows testing of an assumption applicable to a population. For my analysis, we have used one-sample t-test taking time_in_hospital as the analysis variable. A one-sample T- test compares the mean of the sample to the null hypothesis mean. Using the Kolmogorov-Smirnov test value, since p<alpha (p<0.0100), there is significant difference in the variable time_in_hospital. In fact, using Cramer-von Mises test value and Anderson-Darling test value too, p value is less than the corresponding alpha value (p<0.0050). And therefore, there is significant difference in the variable time_in_hospital.
  • 29. CIS-5210 HEALTHCARE DATA ANALYTICS 29 Full Screenshot:
  • 30. CIS-5210 HEALTHCARE DATA ANALYTICS 30 REFERENCES https://www.kaggle.com/brandao/diabetes https://www.wyzant.com/resources/lessons/math/statistics_and_probability/averages/cumulative_ frequency_percentiles_and_quartiles