SlideShare a Scribd company logo
1 of 46
Download to read offline
1
Big Data Analytics for
Healthcare
Chandan K. Reddy
Department of Computer Science
Wayne State University
Tutorial presentation at SIGKDD 2013.
The updated tutorial slides are available at http://dmkd.cs.wayne.edu/TUTORIAL/Healthcare/
Jimeng Sun
Healthcare Analytics Department
IBM TJ Watson Research Center
2
Motivation
Hersh, W., Jacko, J. A., Greenes, R., Tan, J., Janies, D., Embi, P. J., & Payne, P. R. (2011). Health-care hit or miss? Nature, 470(7334), 327.
In 2012, worldwide digital healthcare
data was estimated to be equal to 500
petabytes and is expected to reach
25,000 petabytes in 2020.
Can we learn from the past to
become better in the future ??
Healthcare Data is
becoming more complex !!
3
Organization of this Tutorial
 Introduction
 Clinical Predictive Modeling
– Case Study: Readmission Prediction
 Scalable Healthcare Analytics Platform
 Genetic Data Analysis
 Conclusion
4
What is Big Data
 Large and complex data sets which are difficult to process using
traditional database technology.
The four dimensions (V’s) of Big Data
Big data is not just about size.
• Finds insights from complex, noisy,
heterogeneous, longitudinal, and
voluminous data.
• It aims to answer questions that
were previously unanswered.
BIG
DATA
Velocity
Veracity
Variety
Volume
5
Healthcare Analytics in the Electronic Era
 Old way: Data are expensive and small
– Input data are from clinical trials, which is small
and costly
– Modeling effort is small since the data is limited
 EHR era: Data are cheap and large
– Broader patient population
– Noisy data
– Heterogeneous data
– Diverse scale
– Longitudinal records
6
GOAL: Provide Personalized care through right intervention to
the right patient at the right time.
Electronic
Health Records
Evidence
+ Insights
Improved outcomes
through smarter decisions
Lower costs
Big Data
Analytics
Overall Goals of Big Data Analytics in Healthcare
BehavioralGenomic
Public Health
7
Examples for Big Data Analytics in Healthcare
 Medicare Penalties: Medicare penalizes hospitals that have high rates of
readmissions among patients with Heart failure, Heart attack, Pneumonia.
 BRAIN Initiative: Find new ways to treat, cure, and even prevent brain
disorders, such as Alzheimer’s disease, epilepsy, and traumatic brain
injury. A new bold $100 million research initiative designed to revolutionize
our understanding of the human brain.
 Heritage Health Prize: Develop algorithms to predict the number of days
a patient will spend in a hospital in the next year. http://www.heritagehealthprize.com
 GE Head Health Challenge: Methods for Diagnosis and Prognosis of Mild
Traumatic Brain Injuries. Develop Algorithms and Analytical Tools, and
Biomarkers and other technologies. A total of $60M in awards.
Industry Initiatives
Government Initiatives
8
Data Collection and Analysis
Jensen, Peter B., Lars J. Jensen, and Søren Brunak. "Mining electronic health records: towards better
research applications and clinical care." Nature Reviews Genetics (2012).
Effectively integrating and efficiently analyzing various forms of healthcare data over
a period of time can answer many of the impending healthcare problems.
9
PREDICTION MODELS FOR
CLINICAL DATA ANALYSIS
10
Different Kinds of Outcomes
Diagnostic
Presence of
a disease
What
disease
How
serious
Economic Cost
Survival analysis
…
Binary Outcomes
Categorical Outcomes
Ordinal Outcomes
Continuous Outcomes
Survival Outcomes
11
Examples for Different Outcomes in Healthcare
 Binary Outcomes
– Death: yes/no
– Adverse event: yes/no
 Continuous Outcomes
– Days of Hospital stay
– Visual analogue score
 Ordinal Outcomes
– Quality of life scale
– Grade of tumour progression
– Count data (Number of heart attacks)
 Survival Outcomes
– Cancer survival
– Clinical Trials
12
Continuous Outcomes
 Linear Regression
The Outcome y is assumed to be the linear combination of the
variables with the estimated regression coefficients .
= +
the coefficients usually estimated by minimize the RSS (“residual sum
of squares”). Various penalized RSS methods are used to shrink the ,
and achieve a more stabilized model.
= ( − ) ( − Β)
The minimum of the sum of squares is
found by setting the gradient to zero.
= ( )
 Generalized Additive Model (GAM)
= + +
Where refers to the intercept, refers to functions for each predictor
which is more flexible.
13
Binary Outcomes
 Logistic Regression
The model is stated in terms of the probability that y=1, rather than the
outcomes Y directly, it can be viewed as a linear function in the logistic
transformation:
Pr = 1 =
1
1 +
where = ∑ , the coefficients usually estimated by maximum
likelihood in a standard logistic regression approach.
14
Binary Outcomes
 Bayes Modeling
The predict is given based on the Bayes theorem:
=
∙ ( )
( )
P(X) is always constant, and the prior probability P(D) can be easily
calculated from the training set. Two ways to estimate the class-conditional
probabilities : naϊve Bayes and Bayesian belief network.
naϊve Bayes: assuming the attributes are conditionally independent, so
= ( | )
15
Binary Outcomes
 Classification and Regression Trees
Recursively split the patients into smaller
subgroups. Splits are based on cut-off levels
of the predictors, which can maximize the
difference between two subgroups, and
minimize the variability within these
subgroups.
Some other commonly used binary outcomes models:
Multivariate additive regression splines (MARS) models
Support Vector Machine (SVM)
Neural Nets …
16
Categorical Outcomes
 Polytomous Logistic Regression
The model for j outcomes categories can be written as:
= . = = + , ,
One vs all approach. Similar to multi-class prediction.
j-1 models are fitted and combined to the prediction.
Ordinal Outcomes
 Proportional Odds Logistic Regression
A common set of regression coefficients is assumed across all
levels of the outcome, and intercepts are estimated for each level.
= +
17
Survival Outcomes
 Survival Analysis
Survival Analysis typically focuses on
time to event data. Typically, survival
data are not fully observed, but rather are
censored. In survival analysis, the
primary focus is on the survival function.
= Pr ≥
where is the Failure time or the time
that a event happens. So the means
the probability that the instance can
survive for longer than a certain time t.
The censoring variable is the time of
withdrawn, lost, or study end time.
The hazard function: the probability the
“event” of interest occurs in the next
instant, given survival to time t.
Death
Right censored
18
Survival Outcomes
 Non-Parametric Approaches
– Kaplan-Meier Analysis
– Clinical Life Tables
– Estimates the Survival Function
 Semi-Parametric Approaches
– Cox Proportional Hazards Model
– Estimates the Hazard Function
20
Survival Outcomes
 Kaplan-Meier Analysis
Kaplan-Meier analysis is a nonparametric approach to survival
outcomes. The survival function is:
= (1 − )
:
where
• … is a set of distinct death times
observed in the sample.
• is the number of deaths at .
• is the number of censored
observations between and .
• is the number of individuals “at risk”
right before the death.
= − −
Efron, Bradley. "Logistic regression, survival analysis, and the Kaplan-Meier curve." Journal
of the American Statistical Association 83.402 (1988): 414-425.
21
Survival Outcomes
Patient Days Status
1 21 1
2 39 1
3 77 1
4 133 1
5 141 2
6 152 1
7 153 1
8 161 1
9 179 1
10 184 1
11 197 1
12 199 1
13 214 1
14 228 1
Patient Days Status
15 256 2
16 260 1
17 261 1
18 266 1
19 269 1
20 287 3
21 295 1
22 308 1
23 311 1
24 321 2
25 326 1
26 355 1
27 361 1
28 374 1
Patient Days Status
29 398 1
30 414 1
31 420 1
32 468 2
33 483 1
34 489 1
35 505 1
36 539 1
37 565 3
38 618 1
39 793 1
40 794 1
Example
Status
1: Death
2: Lost to follow up
3: Withdrawn Alive
22
Kaplan-Meier Analysis
Kaplan-Meier Analysis
Time Status ( )
1 21 1 1 0 40 0.975
2 39 1 1 0 39 0.95
3 77 1 1 0 38 0.925
4 133 1 1 0 37 0.9
5 141 2 0 1 36 .
6 152 1 1 0 35 0.874
7 153 1 1 0 34 0.849
= (1 − )
:
K_M Estimator:
23
Kaplan-Meier Analysis
K_M Estimator:
Time Status
( )
∑ Time Status
( )
∑Estimate Sdv Error Estimate Sdv Error
1 21 1 0.975 0.025 1 40 21 287 3 . . 18 20
2 39 1 0.95 0.034 2 39 22 295 1 0.508 0.081 19 19
3 77 1 0.925 0.042 3 38 23 308 1 0.479 0.081 20 18
4 133 1 0.9 0.047 4 37 24 311 1 0.451 0.081 21 17
5 141 2 . . 4 36 25 321 2 . . 21 16
6 152 1 0.874 0.053 5 35 26 326 1 0.421 0.081 22 15
7 153 1 0.849 0.057 6 34 27 355 1 0.391 0.081 23 14
8 161 1 0.823 0.061 7 33 28 361 1 0.361 0.08 24 13
9 179 1 0.797 0.064 8 32 29 374 1 0.331 0.079 25 12
10 184 1 0.771 0.067 9 31 30 398 1 0.301 0.077 26 11
11 193 1 0.746 0.07 10 30 31 414 1 0.271 0.075 27 10
12 197 1 0.72 0.072 11 29 32 420 1 0.241 0.072 28 9
13 199 1 0.694 0.074 12 28 33 468 2 . . 28 8
14 214 1 0.669 0.075 13 27 34 483 1 0.206 0.07 29 7
15 228 1 0.643 0.077 14 26 35 489 1 0.172 0.066 30 6
16 256 2 . . 14 25 36 505 1 0.137 0.061 31 5
17 260 1 0.616 0.078 15 24 37 539 1 0.103 0.055 32 4
18 261 1 0.589 0.079 16 23 38 565 3 . . 32 3
19 266 1 0.563 0.08 17 22 39 618 1 0.052 0.046 33 2
20 269 1 0.536 0.08 18 21 40 794 1 0 0 34 1
24
Clinical Life Tables
 Clinical life tables applies to grouped survival data from studies in
patients with specific diseases, it focuses more on the conditional
probability of dying within the interval.
the time interval is [ , ) VS.
… is a set of distinct death
timesThe survival function is:
= (1 − )
K_M analysis suits small data set with a more accurate analysis,
Clinical life table suit for large data set with a relatively approximate result.
nonparametric
Assumption:
• at the beginning of each interval: = −
• at the end of each interval: =
• on average halfway through the interval: = − /2
Cox, David R. "Regression models and life-tables", Journal of the Royal Statistical Society.
Series B (Methodological) (1972): 187-220.
25
Clinical Life Tables
Clinical Life Table
Interval
Interval
Start Time
Interval
End Time ( )
Std. Error
of ( )
1 0 182 40 1 39.5 8 0.797 0.06
2 183 365 31 3 29.5 15 0.392 0.08
3 366 548 13 1 12.5 8 0.141 0.06
4 549 731 4 1 3.5 1 0.101 0.05
5 732 915 2 0 2 2 0 0
= (1 − )
Clinical Life Table:
NOTE:
The length of interval
is half year(183 days)
On average halfway through
the interval: = − /2
26
Survival Outcomes
 Cox Proportional Hazards model
This semi-parametric model is the most common model used for
survival data, it can be written as:
= ⇒ log = ,
Where ( ) is the baseline hazard function which must be positive.
The function shows that Cox Proportional Hazards model is a linear
model for the log of the hazard ratio. Maximum Likelihood methods are
used to estimate the perimeter, Cox (1972) uses the idea of a partial
likelihood, to generalize it for censoring.
=
∑ ∈ ( )
Where, is the censoring variable
(1=if event, 0 if censored) and
( ) as the risk set at the failure
time of individual
are estimated without any
assumptions about the form
of ( ). This is what makes
the model semi-parametric.
Cox, David R. "Regression models and life-tables", Journal of the Royal Statistical Society. Series B
(Methodological) (1972): 187-220.
27
Survival Outcomes
=
∑ ∈ ( )
=
∑ ∈ ( )
( ) as the risk set at the failure time
log-partial likelihood:
= log = log
∑ ∈ ( )
= − log
∈ ( )
The partial likelihood score equations are:
= = −
∑ ∈ ( )
∑ ∈ ( )
The maximum partial likelihood estimators can be found by
solving = 0.
COX (cont.)
28
Regularized Cox Regression
 Regularization functions for cox regression
= + ∗ ( )
( ) is a sparsity inducing norm such and is the regularization parameter.
Promotes Sparsity
Handles Correlation
Sparsity + Correlation
Adaptive Variants are
slightly more effective
Method Penalty Term Formulation
LASSO
Ridge
Elastic Net (EN) | | + (1 − )
Adaptive LASSO (AL) ∑ | |
Adaptive Elastic Net (AEN) | | + (1 − )
29
Survival Outcomes
 Random Survival Forests
Steps:
1. Draw B booststrap samples from the original data (63% in the bag data,
37% Out of bag data(OOB)).
2. Grow a survival tree for each bootstrap sample. randomly select
candidate variables, and splits the node using the candidate variable that
maximizes survival difference between daughter nodes.
3. Grow the tree to full size, each terminal node should have no less than
> 0 unique deaths.
4. Calculate a Cumulative Hazard Function (CHF) for each tree. Average to
obtain the ensemble CHF.
5. Using OOB data, calculate prediction error for the ensemble CHF.
Random
Forests
Survival
Tree RSF
Ishwaran, H., Kogalur, U. B., Blackstone, E. H. and Lauer, M. S. (2008). Random Survival Forests.
Annals of Applied Statistics 2, 841–860.
30
Survival Outcomes
 Survival Tree
Survival trees is similar to decision tree which is built by recursive
splitting of tree nodes. A node of a survival tree is considered “pure” if
the patients all survive for an identical span of time.
A good split for a node maximizes the survival difference between
daughters. The logrank is most commonly used dissimilarity measure
that estimates the survival difference between two groups. For each
node, examine every allowable split on each predictor variable, then
select and execute the best of these splits.
 Logrank Test
The logrank test is obtained by constructing a (2 X 2) table at each
distinct death time, and comparing the death rates between the two
groups, conditional on the number at risk in the groups.
LeBlanc, M. and Crowley, J. (1993). Survival Trees by Goodness of Split. Journal of the American
Statistical Association 88, 457–467.
31
Survival Outcomes
=
∑ − × /
∑
( − )
( − )
the numerator is the squared sum of deviations between the observed
and expected values. The denominator is the variance of the
(Patnaik ,1948).
The test statistic, , gets bigger as the differences between the
observed and expected values get larger, or as the variance gets
smaller.
Let , … , represent the ordered, distinct death times. At the -th
death time, we have the following table:
Segal, Mark Robert. "Regression trees for censored data." Biometrics (1988): 35-47.
32
Evaluation of Performance
 Brier Score (Mean Squared Error)
Brier Score is used to evaluate the binary outcomes, which is simply
defined as ( − ) , where the squared difference between actual
outcomes and predictions are calculated.
 (R-squared or coefficient of determination
The amount of is an overall measure to quantify the amount of
information in a model for a given data set. It gives the percent of total
variation that is described by the variation in X.
The total variation is defined as: = ∑ −
is the mean value of the , and is the total number of individuals.
= 1 −
( )
33
Sensitivity and Specificity
 Sensitivity: Proportion of actual positives which are correctly identified.
 Specificity: Proportion of actual negatives which are correctly identified.
Source: Wikipedia Page
34
ROC Curve
A ROC Curve is been used to quantify the diagnostic value of a test
over its whole range of possible cutoffs for classifying patients as
positive vs. negative. In each possible cutoff the true positive rate and
false positive rate will be calculated as the X and Y coordinates in the
ROC Curve.
35
C-Statistic
 The C-statistic is a rank order statistic for predictions against true
outcomes. Commonly used concordance measure is
 The c-statistic also known as the concordance probability is defined
as the ratio of the concordant to discordant pairs.
 A (i,j) pair is called a concordant pair if > and S( ) > S( ).
and are the survival times and S( ) and S( ) are the predicted
survival times.
 A discordant pair is one where if > and S( ) < S( ) .
 For a binary outcome c is identical to the area under the ROC curve.
 In c-index calculation all the pairs where the instance with shorter
survival time is censored are not considered as comparable pairs.
Uno, Hajime, et al. "On the C‐statistics for evaluating overall adequacy of risk prediction procedures
with censored survival data." Statistics in medicine 30.10 (2011): 1105-1117.
36
CASE STUDY: HEART FAILURE
READMISSION PREDICTION
37
Penalties for Poor Care - 30-Day Readmissions
 Hospitalizations account for more than 30% of the 2
trillion annual cost of healthcare in the United States.
Around 20% of all hospital admissions occur within 30
days of a previous discharge.
– not only expensive but are also potentially harmful,
and most importantly, they are often preventable.
 Medicare penalties from FY12 - heart failure, heart attack, and pneumonia.
 Identifying patients at risk of readmission can guide efficient resource
utilization and can potentially save millions of healthcare dollars each year.
 Effectively making predictions from such complex hospitalization data will
require the development of novel advanced analytical models.
38
Patient Readmission Cycle
Patient Index
Hospitalization
Adm Lab Values
(BUN, CREAT GFR)
Dis Lab Values
(BUN, CREAT GFR)
Demographics
(Age, Sex Race)
Medications
Diuretics
Patient in Hospital
Stay
Pharmacy Claims
ACE, Beta blockers
Patient
Discharge
Comorbidities
Diabetes,
Hypertension
Medications
ACE Inhibitor
Beta Blockers
Milrinone
Procedures
Cardiac Catheterization
Mechanical Ventilation
Hemodialysis
Treatment Period
Diagnosis
Pre-discharge Period
Follow-up PeriodReadmission Period
39
Challenges with Clinical Data Analysis
 Integrating Multi-source Data – Data maintained over different
sources such as procedures, medications, labs, demographics etc.
 Data Cleaning – Many labs and procedures have missing values.
 Non-uniformity in Columns – Certain lab values are measured in
meq/DL and mmol/DL.
– Solution: Apply the logarithmic transformation to standardize and
normalize the data.
 Multiple instances for unique patients – Several records from different
data sources are available for the same hospitalization of a patient.
– Solution: Obtain summary statistics such as minimum, maximum
and average for aggregating multiple instances into a single one.
40
Snapshot of EHR Data
ptID LABName LABval Low High Time Date
1665 BUN 24 10 25 12:00 07/01/2007
1665 BUN 28 10 25 12:30 07/01/2007
1665 BUN 32 10 25 09:00 07/02/2007
1665 BUN 26 10 25 18:00 07/02/2007
1665 BUN 30 10 25 09:00 07/03/2007
LABS DATA
ptID Age Sex Race
1665 57 M African American
DEMOGRAPHICS
ptID Log(BMAX) Log(BMIN) Log(BAVG) BUNCOUNT ALR
1665 1.5 1.38 1.44 5 0.8
ALR – Abnormal Labs Ratio
BMAX/BMIN/BAVG- BUN maximum/minimum/average
CLINICAL FEATURE
TRANSFORMATION
41
Snapshot of EHR Data
ptID Procedures Medications Time Date
1665 Cardiac cath ACE inhibitor 12:00 07/01/2007
1665 Mech vent ACE inhibitor 12:30 07/01/2007
1665 Cardiac cath Beta blocker 09:00 07/02/2007
1665 Cardiac cath ACE inhibitor 18:00 07/02/2007
1665 Mech vent Beta blocker 09:00 07/03/2007
PROCEDURES AND MEDICATIONS
ptID ACE inhib Beta block Cardiac cath Mech vent
1665 3 2 3 2
CLINICAL FEATURE
TRANSFORMATION
42
Medications
Laboratory
Reports
Demographics
Comorbidities
Procedures
Training Data from
the Hospital
Data
Warehouse
Data
Integration
Selected Data
Selected Data
Feature
Transformation
Discharge Time
Off-Line Processing
Laboratory Reports
Medications
Demographics
Comorbidities
Procedures
Patient records in
Hospital
Predictive
Analytics
Data
Cleaning
Evaluation
Take
Action
43
 Staying back in hospital
 Home nurse visitation
 Telephone support
 Automated reminder
RISK
COST
PredictiveAnalytics
LOW
HIGH
Interventions – Depending on the Risk scores at the Discharge
HOSPITAL RESOURCE UTILIZATION
THE COST SPENT FOR PATIENT CARE IS DETERMINED BY THE RISK
44
AUC Performance Comparisons
Heart Failure Patient data collected at Henry Ford Hospital between 2000-2009.
45
Biomarker Identification
Ross, Joseph S., et al. "Statistical models and patient predictors of readmission for heart failure: a
systematic review." Archives of internal medicine 168.13 (2008): 1371.
46
Top Biomarkers obtained from Regularized COX
Biomarkers Association Values
HGB Negative HGB < 12.3 mg/DL
BUN High Positive BUN > 40 mg/DL
CREAT High Positive CRET > 1.21 mg/DL
GFR High Negative GFR < 48.61 mg/DL
PHOS Positive PHOS > 3.9 meq/DL
K Positive K > 3.7 meq/DL
MG Positive MG > 1.2 meq/DL
LDL Negative LDL < 0.6 mg/DL
Already known and
well-studied
Biomarkers
Newly found
Biomarkers-
Clinicians showing
some interest
47
Conclusions and Future of our Study
 The performance of regularized Cox algorithms is better than that of
simple Cox regression and other standard predictive algorithms.
 Biomarkers selected through this method are clinically relevant.
 One advantage of Cox models is that there is no re-training needed if
we change the time of interest (from 30 days to 90 days).
 Currently working on temporal modeling using survival analysis.
 Adding claims data for a partial set of patients.

More Related Content

What's hot

The Use of Predictive Analytics in Health Care
The Use of Predictive Analytics in Health CareThe Use of Predictive Analytics in Health Care
The Use of Predictive Analytics in Health Carejetweedy
 
Big data and the Healthcare Sector
Big data and the Healthcare Sector Big data and the Healthcare Sector
Big data and the Healthcare Sector Chris Groves
 
Big implications of Big Data in healthcare
Big implications of Big Data in healthcareBig implications of Big Data in healthcare
Big implications of Big Data in healthcareGuires
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareSkillspeed
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecasesSreenatha Reddy K R
 
Big Data in Medicine
Big Data in MedicineBig Data in Medicine
Big Data in MedicineNasir Arafat
 
Big Data Analytics for Smart Health Care
Big Data Analytics for Smart Health CareBig Data Analytics for Smart Health Care
Big Data Analytics for Smart Health CareEshan Bhuiyan
 
Machine Learning in Healthcare
Machine Learning in HealthcareMachine Learning in Healthcare
Machine Learning in HealthcareBigR.io
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsPerficient, Inc.
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesSlideTeam
 
Heart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine LearningHeart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine Learningmohdshoaibuddin1
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcareDeZyre
 
Healthcare Information Analytics
Healthcare Information AnalyticsHealthcare Information Analytics
Healthcare Information AnalyticsFrank Wang
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsLarry Smarr
 
Big Data applications in Health Care
Big Data applications in Health CareBig Data applications in Health Care
Big Data applications in Health CareLeo Barella
 

What's hot (20)

The Use of Predictive Analytics in Health Care
The Use of Predictive Analytics in Health CareThe Use of Predictive Analytics in Health Care
The Use of Predictive Analytics in Health Care
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big data and the Healthcare Sector
Big data and the Healthcare Sector Big data and the Healthcare Sector
Big data and the Healthcare Sector
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Big implications of Big Data in healthcare
Big implications of Big Data in healthcareBig implications of Big Data in healthcare
Big implications of Big Data in healthcare
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in Healthcare
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Big Data in Medicine
Big Data in MedicineBig Data in Medicine
Big Data in Medicine
 
Big Data Analytics for Smart Health Care
Big Data Analytics for Smart Health CareBig Data Analytics for Smart Health Care
Big Data Analytics for Smart Health Care
 
AI in Healthcare.pptx
AI in Healthcare.pptxAI in Healthcare.pptx
AI in Healthcare.pptx
 
Machine Learning in Healthcare
Machine Learning in HealthcareMachine Learning in Healthcare
Machine Learning in Healthcare
 
Using Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and AnalyticsUsing Big Data for Improved Healthcare Operations and Analytics
Using Big Data for Improved Healthcare Operations and Analytics
 
Big Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation SlidesBig Data Characteristics And Process PowerPoint Presentation Slides
Big Data Characteristics And Process PowerPoint Presentation Slides
 
Heart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine LearningHeart Attack Prediction using Machine Learning
Heart Attack Prediction using Machine Learning
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
 
Healthcare Information Analytics
Healthcare Information AnalyticsHealthcare Information Analytics
Healthcare Information Analytics
 
Data analytics
Data analyticsData analytics
Data analytics
 
Machine Learning in Healthcare Diagnostics
Machine Learning in Healthcare DiagnosticsMachine Learning in Healthcare Diagnostics
Machine Learning in Healthcare Diagnostics
 
Big Data applications in Health Care
Big Data applications in Health CareBig Data applications in Health Care
Big Data applications in Health Care
 

Similar to Big Data Analytics for Healthcare

Machine Learning for Survival Analysis
Machine Learning for Survival AnalysisMachine Learning for Survival Analysis
Machine Learning for Survival AnalysisChandan Reddy
 
Android Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction SystemAndroid Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction Systemijtsrd
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MININGAshish Salve
 
Survival Analysis Lecture.ppt
Survival Analysis Lecture.pptSurvival Analysis Lecture.ppt
Survival Analysis Lecture.ppthabtamu biazin
 
1 FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx
1  FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx1  FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx
1 FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docxmercysuttle
 
Prediction of heart disease using neural network
Prediction of heart disease using neural networkPrediction of heart disease using neural network
Prediction of heart disease using neural networkIRJET Journal
 
Ct lecture 20. survival analysis (part 2)
Ct lecture 20. survival analysis (part 2)Ct lecture 20. survival analysis (part 2)
Ct lecture 20. survival analysis (part 2)Hau Pham
 
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...ijdmtaiir
 
Metanalysis Lecture
Metanalysis LectureMetanalysis Lecture
Metanalysis Lecturedrmomusa
 
Measuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitMeasuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitLaure Wynants
 
1Big Data Analytics forHealthcareChandan K. ReddyD.docx
1Big Data Analytics forHealthcareChandan K. ReddyD.docx1Big Data Analytics forHealthcareChandan K. ReddyD.docx
1Big Data Analytics forHealthcareChandan K. ReddyD.docxaulasnilda
 
COMP60431 Machine Learning Advanced Computer Science MSc
COMP60431 Machine Learning Advanced Computer Science MScCOMP60431 Machine Learning Advanced Computer Science MSc
COMP60431 Machine Learning Advanced Computer Science MScbutest
 
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-applicationAcute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-applicationCemal Ardil
 
Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Evangelos Kritsotakis
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine LearningIRJET Journal
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Evangelos Kritsotakis
 
MH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMin-hyung Kim
 
Running head COURSE PROJECT –PHASE 3 COURSE PROJECT –PHASE 3.docx
Running head COURSE PROJECT –PHASE 3 COURSE PROJECT –PHASE 3.docxRunning head COURSE PROJECT –PHASE 3 COURSE PROJECT –PHASE 3.docx
Running head COURSE PROJECT –PHASE 3 COURSE PROJECT –PHASE 3.docxsusanschei
 

Similar to Big Data Analytics for Healthcare (20)

Machine Learning for Survival Analysis
Machine Learning for Survival AnalysisMachine Learning for Survival Analysis
Machine Learning for Survival Analysis
 
Survival analysis
Survival analysisSurvival analysis
Survival analysis
 
Android Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction SystemAndroid Based Questionnaires Application for Heart Disease Prediction System
Android Based Questionnaires Application for Heart Disease Prediction System
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
Survival Analysis Lecture.ppt
Survival Analysis Lecture.pptSurvival Analysis Lecture.ppt
Survival Analysis Lecture.ppt
 
1 FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx
1  FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx1  FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx
1 FACULTY OF SCIENCE AND ENGINEERING SCHOOL OF COMPUT.docx
 
Prediction of heart disease using neural network
Prediction of heart disease using neural networkPrediction of heart disease using neural network
Prediction of heart disease using neural network
 
Ct lecture 20. survival analysis (part 2)
Ct lecture 20. survival analysis (part 2)Ct lecture 20. survival analysis (part 2)
Ct lecture 20. survival analysis (part 2)
 
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...
 
Metanalysis Lecture
Metanalysis LectureMetanalysis Lecture
Metanalysis Lecture
 
Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
Measuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net BenefitMeasuring clinical utility: uncertainty in Net Benefit
Measuring clinical utility: uncertainty in Net Benefit
 
1Big Data Analytics forHealthcareChandan K. ReddyD.docx
1Big Data Analytics forHealthcareChandan K. ReddyD.docx1Big Data Analytics forHealthcareChandan K. ReddyD.docx
1Big Data Analytics forHealthcareChandan K. ReddyD.docx
 
COMP60431 Machine Learning Advanced Computer Science MSc
COMP60431 Machine Learning Advanced Computer Science MScCOMP60431 Machine Learning Advanced Computer Science MSc
COMP60431 Machine Learning Advanced Computer Science MSc
 
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-applicationAcute coronary-syndrome-prediction-using-data-mining-techniques--an-application
Acute coronary-syndrome-prediction-using-data-mining-techniques--an-application
 
Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)Common statistical pitfalls & errors in biomedical research (a top-5 list)
Common statistical pitfalls & errors in biomedical research (a top-5 list)
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...Developing and validating statistical models for clinical prediction and prog...
Developing and validating statistical models for clinical prediction and prog...
 
MH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -cleanMH Prediction Modeling and Validation -clean
MH Prediction Modeling and Validation -clean
 
Running head COURSE PROJECT –PHASE 3 COURSE PROJECT –PHASE 3.docx
Running head COURSE PROJECT –PHASE 3 COURSE PROJECT –PHASE 3.docxRunning head COURSE PROJECT –PHASE 3 COURSE PROJECT –PHASE 3.docx
Running head COURSE PROJECT –PHASE 3 COURSE PROJECT –PHASE 3.docx
 

Recently uploaded

Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableGENUINE ESCORT AGENCY
 
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...chennailover
 
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...chennailover
 
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...karishmasinghjnh
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Availableperfect solution
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...tanya dube
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeCall Girls Delhi
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋TANUJA PANDEY
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426jennyeacort
 
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In AhmedabadGENUINE ESCORT AGENCY
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Dipal Arora
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...hotbabesbook
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...chandars293
 
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...khalifaescort01
 
Low Rate Call Girls Bangalore {7304373326} ❤️VVIP NISHA Call Girls in Bangalo...
Low Rate Call Girls Bangalore {7304373326} ❤️VVIP NISHA Call Girls in Bangalo...Low Rate Call Girls Bangalore {7304373326} ❤️VVIP NISHA Call Girls in Bangalo...
Low Rate Call Girls Bangalore {7304373326} ❤️VVIP NISHA Call Girls in Bangalo...Sheetaleventcompany
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...Arohi Goyal
 
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on WhatsappMost Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on WhatsappInaaya Sharma
 
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...hotbabesbook
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...parulsinha
 

Recently uploaded (20)

🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
🌹Attapur⬅️ Vip Call Girls Hyderabad 📱9352852248 Book Well Trand Call Girls In...
 
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service AvailableCall Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
Call Girls Ahmedabad Just Call 9630942363 Top Class Call Girl Service Available
 
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
 
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
 
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
Independent Call Girls Service Mohali Sector 116 | 6367187148 | Call Girl Ser...
 
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service AvailableCall Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
Call Girls Rishikesh Just Call 9667172968 Top Class Call Girl Service Available
 
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
Premium Bangalore Call Girls Jigani Dail 6378878445 Escort Service For Hot Ma...
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
Call Girls in Delhi Triveni Complex Escort Service(🔝))/WhatsApp 97111⇛47426
 
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 9332606886 ⟟ Call Me For Genuine ...
 
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
Call Girls Service Jaipur {9521753030 } ❤️VVIP BHAWNA Call Girl in Jaipur Raj...
 
Low Rate Call Girls Bangalore {7304373326} ❤️VVIP NISHA Call Girls in Bangalo...
Low Rate Call Girls Bangalore {7304373326} ❤️VVIP NISHA Call Girls in Bangalo...Low Rate Call Girls Bangalore {7304373326} ❤️VVIP NISHA Call Girls in Bangalo...
Low Rate Call Girls Bangalore {7304373326} ❤️VVIP NISHA Call Girls in Bangalo...
 
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
All Time Service Available Call Girls Marine Drive 📳 9820252231 For 18+ VIP C...
 
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on WhatsappMost Beautiful Call Girl in Bangalore Contact on Whatsapp
Most Beautiful Call Girl in Bangalore Contact on Whatsapp
 
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
Model Call Girls In Chennai WhatsApp Booking 7427069034 call girl service 24 ...
 
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
Independent Call Girls In Jaipur { 8445551418 } ✔ ANIKA MEHTA ✔ Get High Prof...
 

Big Data Analytics for Healthcare

  • 1. 1 Big Data Analytics for Healthcare Chandan K. Reddy Department of Computer Science Wayne State University Tutorial presentation at SIGKDD 2013. The updated tutorial slides are available at http://dmkd.cs.wayne.edu/TUTORIAL/Healthcare/ Jimeng Sun Healthcare Analytics Department IBM TJ Watson Research Center
  • 2. 2 Motivation Hersh, W., Jacko, J. A., Greenes, R., Tan, J., Janies, D., Embi, P. J., & Payne, P. R. (2011). Health-care hit or miss? Nature, 470(7334), 327. In 2012, worldwide digital healthcare data was estimated to be equal to 500 petabytes and is expected to reach 25,000 petabytes in 2020. Can we learn from the past to become better in the future ?? Healthcare Data is becoming more complex !!
  • 3. 3 Organization of this Tutorial  Introduction  Clinical Predictive Modeling – Case Study: Readmission Prediction  Scalable Healthcare Analytics Platform  Genetic Data Analysis  Conclusion
  • 4. 4 What is Big Data  Large and complex data sets which are difficult to process using traditional database technology. The four dimensions (V’s) of Big Data Big data is not just about size. • Finds insights from complex, noisy, heterogeneous, longitudinal, and voluminous data. • It aims to answer questions that were previously unanswered. BIG DATA Velocity Veracity Variety Volume
  • 5. 5 Healthcare Analytics in the Electronic Era  Old way: Data are expensive and small – Input data are from clinical trials, which is small and costly – Modeling effort is small since the data is limited  EHR era: Data are cheap and large – Broader patient population – Noisy data – Heterogeneous data – Diverse scale – Longitudinal records
  • 6. 6 GOAL: Provide Personalized care through right intervention to the right patient at the right time. Electronic Health Records Evidence + Insights Improved outcomes through smarter decisions Lower costs Big Data Analytics Overall Goals of Big Data Analytics in Healthcare BehavioralGenomic Public Health
  • 7. 7 Examples for Big Data Analytics in Healthcare  Medicare Penalties: Medicare penalizes hospitals that have high rates of readmissions among patients with Heart failure, Heart attack, Pneumonia.  BRAIN Initiative: Find new ways to treat, cure, and even prevent brain disorders, such as Alzheimer’s disease, epilepsy, and traumatic brain injury. A new bold $100 million research initiative designed to revolutionize our understanding of the human brain.  Heritage Health Prize: Develop algorithms to predict the number of days a patient will spend in a hospital in the next year. http://www.heritagehealthprize.com  GE Head Health Challenge: Methods for Diagnosis and Prognosis of Mild Traumatic Brain Injuries. Develop Algorithms and Analytical Tools, and Biomarkers and other technologies. A total of $60M in awards. Industry Initiatives Government Initiatives
  • 8. 8 Data Collection and Analysis Jensen, Peter B., Lars J. Jensen, and Søren Brunak. "Mining electronic health records: towards better research applications and clinical care." Nature Reviews Genetics (2012). Effectively integrating and efficiently analyzing various forms of healthcare data over a period of time can answer many of the impending healthcare problems.
  • 10. 10 Different Kinds of Outcomes Diagnostic Presence of a disease What disease How serious Economic Cost Survival analysis … Binary Outcomes Categorical Outcomes Ordinal Outcomes Continuous Outcomes Survival Outcomes
  • 11. 11 Examples for Different Outcomes in Healthcare  Binary Outcomes – Death: yes/no – Adverse event: yes/no  Continuous Outcomes – Days of Hospital stay – Visual analogue score  Ordinal Outcomes – Quality of life scale – Grade of tumour progression – Count data (Number of heart attacks)  Survival Outcomes – Cancer survival – Clinical Trials
  • 12. 12 Continuous Outcomes  Linear Regression The Outcome y is assumed to be the linear combination of the variables with the estimated regression coefficients . = + the coefficients usually estimated by minimize the RSS (“residual sum of squares”). Various penalized RSS methods are used to shrink the , and achieve a more stabilized model. = ( − ) ( − Β) The minimum of the sum of squares is found by setting the gradient to zero. = ( )  Generalized Additive Model (GAM) = + + Where refers to the intercept, refers to functions for each predictor which is more flexible.
  • 13. 13 Binary Outcomes  Logistic Regression The model is stated in terms of the probability that y=1, rather than the outcomes Y directly, it can be viewed as a linear function in the logistic transformation: Pr = 1 = 1 1 + where = ∑ , the coefficients usually estimated by maximum likelihood in a standard logistic regression approach.
  • 14. 14 Binary Outcomes  Bayes Modeling The predict is given based on the Bayes theorem: = ∙ ( ) ( ) P(X) is always constant, and the prior probability P(D) can be easily calculated from the training set. Two ways to estimate the class-conditional probabilities : naϊve Bayes and Bayesian belief network. naϊve Bayes: assuming the attributes are conditionally independent, so = ( | )
  • 15. 15 Binary Outcomes  Classification and Regression Trees Recursively split the patients into smaller subgroups. Splits are based on cut-off levels of the predictors, which can maximize the difference between two subgroups, and minimize the variability within these subgroups. Some other commonly used binary outcomes models: Multivariate additive regression splines (MARS) models Support Vector Machine (SVM) Neural Nets …
  • 16. 16 Categorical Outcomes  Polytomous Logistic Regression The model for j outcomes categories can be written as: = . = = + , , One vs all approach. Similar to multi-class prediction. j-1 models are fitted and combined to the prediction. Ordinal Outcomes  Proportional Odds Logistic Regression A common set of regression coefficients is assumed across all levels of the outcome, and intercepts are estimated for each level. = +
  • 17. 17 Survival Outcomes  Survival Analysis Survival Analysis typically focuses on time to event data. Typically, survival data are not fully observed, but rather are censored. In survival analysis, the primary focus is on the survival function. = Pr ≥ where is the Failure time or the time that a event happens. So the means the probability that the instance can survive for longer than a certain time t. The censoring variable is the time of withdrawn, lost, or study end time. The hazard function: the probability the “event” of interest occurs in the next instant, given survival to time t. Death Right censored
  • 18. 18 Survival Outcomes  Non-Parametric Approaches – Kaplan-Meier Analysis – Clinical Life Tables – Estimates the Survival Function  Semi-Parametric Approaches – Cox Proportional Hazards Model – Estimates the Hazard Function
  • 19. 20 Survival Outcomes  Kaplan-Meier Analysis Kaplan-Meier analysis is a nonparametric approach to survival outcomes. The survival function is: = (1 − ) : where • … is a set of distinct death times observed in the sample. • is the number of deaths at . • is the number of censored observations between and . • is the number of individuals “at risk” right before the death. = − − Efron, Bradley. "Logistic regression, survival analysis, and the Kaplan-Meier curve." Journal of the American Statistical Association 83.402 (1988): 414-425.
  • 20. 21 Survival Outcomes Patient Days Status 1 21 1 2 39 1 3 77 1 4 133 1 5 141 2 6 152 1 7 153 1 8 161 1 9 179 1 10 184 1 11 197 1 12 199 1 13 214 1 14 228 1 Patient Days Status 15 256 2 16 260 1 17 261 1 18 266 1 19 269 1 20 287 3 21 295 1 22 308 1 23 311 1 24 321 2 25 326 1 26 355 1 27 361 1 28 374 1 Patient Days Status 29 398 1 30 414 1 31 420 1 32 468 2 33 483 1 34 489 1 35 505 1 36 539 1 37 565 3 38 618 1 39 793 1 40 794 1 Example Status 1: Death 2: Lost to follow up 3: Withdrawn Alive
  • 21. 22 Kaplan-Meier Analysis Kaplan-Meier Analysis Time Status ( ) 1 21 1 1 0 40 0.975 2 39 1 1 0 39 0.95 3 77 1 1 0 38 0.925 4 133 1 1 0 37 0.9 5 141 2 0 1 36 . 6 152 1 1 0 35 0.874 7 153 1 1 0 34 0.849 = (1 − ) : K_M Estimator:
  • 22. 23 Kaplan-Meier Analysis K_M Estimator: Time Status ( ) ∑ Time Status ( ) ∑Estimate Sdv Error Estimate Sdv Error 1 21 1 0.975 0.025 1 40 21 287 3 . . 18 20 2 39 1 0.95 0.034 2 39 22 295 1 0.508 0.081 19 19 3 77 1 0.925 0.042 3 38 23 308 1 0.479 0.081 20 18 4 133 1 0.9 0.047 4 37 24 311 1 0.451 0.081 21 17 5 141 2 . . 4 36 25 321 2 . . 21 16 6 152 1 0.874 0.053 5 35 26 326 1 0.421 0.081 22 15 7 153 1 0.849 0.057 6 34 27 355 1 0.391 0.081 23 14 8 161 1 0.823 0.061 7 33 28 361 1 0.361 0.08 24 13 9 179 1 0.797 0.064 8 32 29 374 1 0.331 0.079 25 12 10 184 1 0.771 0.067 9 31 30 398 1 0.301 0.077 26 11 11 193 1 0.746 0.07 10 30 31 414 1 0.271 0.075 27 10 12 197 1 0.72 0.072 11 29 32 420 1 0.241 0.072 28 9 13 199 1 0.694 0.074 12 28 33 468 2 . . 28 8 14 214 1 0.669 0.075 13 27 34 483 1 0.206 0.07 29 7 15 228 1 0.643 0.077 14 26 35 489 1 0.172 0.066 30 6 16 256 2 . . 14 25 36 505 1 0.137 0.061 31 5 17 260 1 0.616 0.078 15 24 37 539 1 0.103 0.055 32 4 18 261 1 0.589 0.079 16 23 38 565 3 . . 32 3 19 266 1 0.563 0.08 17 22 39 618 1 0.052 0.046 33 2 20 269 1 0.536 0.08 18 21 40 794 1 0 0 34 1
  • 23. 24 Clinical Life Tables  Clinical life tables applies to grouped survival data from studies in patients with specific diseases, it focuses more on the conditional probability of dying within the interval. the time interval is [ , ) VS. … is a set of distinct death timesThe survival function is: = (1 − ) K_M analysis suits small data set with a more accurate analysis, Clinical life table suit for large data set with a relatively approximate result. nonparametric Assumption: • at the beginning of each interval: = − • at the end of each interval: = • on average halfway through the interval: = − /2 Cox, David R. "Regression models and life-tables", Journal of the Royal Statistical Society. Series B (Methodological) (1972): 187-220.
  • 24. 25 Clinical Life Tables Clinical Life Table Interval Interval Start Time Interval End Time ( ) Std. Error of ( ) 1 0 182 40 1 39.5 8 0.797 0.06 2 183 365 31 3 29.5 15 0.392 0.08 3 366 548 13 1 12.5 8 0.141 0.06 4 549 731 4 1 3.5 1 0.101 0.05 5 732 915 2 0 2 2 0 0 = (1 − ) Clinical Life Table: NOTE: The length of interval is half year(183 days) On average halfway through the interval: = − /2
  • 25. 26 Survival Outcomes  Cox Proportional Hazards model This semi-parametric model is the most common model used for survival data, it can be written as: = ⇒ log = , Where ( ) is the baseline hazard function which must be positive. The function shows that Cox Proportional Hazards model is a linear model for the log of the hazard ratio. Maximum Likelihood methods are used to estimate the perimeter, Cox (1972) uses the idea of a partial likelihood, to generalize it for censoring. = ∑ ∈ ( ) Where, is the censoring variable (1=if event, 0 if censored) and ( ) as the risk set at the failure time of individual are estimated without any assumptions about the form of ( ). This is what makes the model semi-parametric. Cox, David R. "Regression models and life-tables", Journal of the Royal Statistical Society. Series B (Methodological) (1972): 187-220.
  • 26. 27 Survival Outcomes = ∑ ∈ ( ) = ∑ ∈ ( ) ( ) as the risk set at the failure time log-partial likelihood: = log = log ∑ ∈ ( ) = − log ∈ ( ) The partial likelihood score equations are: = = − ∑ ∈ ( ) ∑ ∈ ( ) The maximum partial likelihood estimators can be found by solving = 0. COX (cont.)
  • 27. 28 Regularized Cox Regression  Regularization functions for cox regression = + ∗ ( ) ( ) is a sparsity inducing norm such and is the regularization parameter. Promotes Sparsity Handles Correlation Sparsity + Correlation Adaptive Variants are slightly more effective Method Penalty Term Formulation LASSO Ridge Elastic Net (EN) | | + (1 − ) Adaptive LASSO (AL) ∑ | | Adaptive Elastic Net (AEN) | | + (1 − )
  • 28. 29 Survival Outcomes  Random Survival Forests Steps: 1. Draw B booststrap samples from the original data (63% in the bag data, 37% Out of bag data(OOB)). 2. Grow a survival tree for each bootstrap sample. randomly select candidate variables, and splits the node using the candidate variable that maximizes survival difference between daughter nodes. 3. Grow the tree to full size, each terminal node should have no less than > 0 unique deaths. 4. Calculate a Cumulative Hazard Function (CHF) for each tree. Average to obtain the ensemble CHF. 5. Using OOB data, calculate prediction error for the ensemble CHF. Random Forests Survival Tree RSF Ishwaran, H., Kogalur, U. B., Blackstone, E. H. and Lauer, M. S. (2008). Random Survival Forests. Annals of Applied Statistics 2, 841–860.
  • 29. 30 Survival Outcomes  Survival Tree Survival trees is similar to decision tree which is built by recursive splitting of tree nodes. A node of a survival tree is considered “pure” if the patients all survive for an identical span of time. A good split for a node maximizes the survival difference between daughters. The logrank is most commonly used dissimilarity measure that estimates the survival difference between two groups. For each node, examine every allowable split on each predictor variable, then select and execute the best of these splits.  Logrank Test The logrank test is obtained by constructing a (2 X 2) table at each distinct death time, and comparing the death rates between the two groups, conditional on the number at risk in the groups. LeBlanc, M. and Crowley, J. (1993). Survival Trees by Goodness of Split. Journal of the American Statistical Association 88, 457–467.
  • 30. 31 Survival Outcomes = ∑ − × / ∑ ( − ) ( − ) the numerator is the squared sum of deviations between the observed and expected values. The denominator is the variance of the (Patnaik ,1948). The test statistic, , gets bigger as the differences between the observed and expected values get larger, or as the variance gets smaller. Let , … , represent the ordered, distinct death times. At the -th death time, we have the following table: Segal, Mark Robert. "Regression trees for censored data." Biometrics (1988): 35-47.
  • 31. 32 Evaluation of Performance  Brier Score (Mean Squared Error) Brier Score is used to evaluate the binary outcomes, which is simply defined as ( − ) , where the squared difference between actual outcomes and predictions are calculated.  (R-squared or coefficient of determination The amount of is an overall measure to quantify the amount of information in a model for a given data set. It gives the percent of total variation that is described by the variation in X. The total variation is defined as: = ∑ − is the mean value of the , and is the total number of individuals. = 1 − ( )
  • 32. 33 Sensitivity and Specificity  Sensitivity: Proportion of actual positives which are correctly identified.  Specificity: Proportion of actual negatives which are correctly identified. Source: Wikipedia Page
  • 33. 34 ROC Curve A ROC Curve is been used to quantify the diagnostic value of a test over its whole range of possible cutoffs for classifying patients as positive vs. negative. In each possible cutoff the true positive rate and false positive rate will be calculated as the X and Y coordinates in the ROC Curve.
  • 34. 35 C-Statistic  The C-statistic is a rank order statistic for predictions against true outcomes. Commonly used concordance measure is  The c-statistic also known as the concordance probability is defined as the ratio of the concordant to discordant pairs.  A (i,j) pair is called a concordant pair if > and S( ) > S( ). and are the survival times and S( ) and S( ) are the predicted survival times.  A discordant pair is one where if > and S( ) < S( ) .  For a binary outcome c is identical to the area under the ROC curve.  In c-index calculation all the pairs where the instance with shorter survival time is censored are not considered as comparable pairs. Uno, Hajime, et al. "On the C‐statistics for evaluating overall adequacy of risk prediction procedures with censored survival data." Statistics in medicine 30.10 (2011): 1105-1117.
  • 35. 36 CASE STUDY: HEART FAILURE READMISSION PREDICTION
  • 36. 37 Penalties for Poor Care - 30-Day Readmissions  Hospitalizations account for more than 30% of the 2 trillion annual cost of healthcare in the United States. Around 20% of all hospital admissions occur within 30 days of a previous discharge. – not only expensive but are also potentially harmful, and most importantly, they are often preventable.  Medicare penalties from FY12 - heart failure, heart attack, and pneumonia.  Identifying patients at risk of readmission can guide efficient resource utilization and can potentially save millions of healthcare dollars each year.  Effectively making predictions from such complex hospitalization data will require the development of novel advanced analytical models.
  • 37. 38 Patient Readmission Cycle Patient Index Hospitalization Adm Lab Values (BUN, CREAT GFR) Dis Lab Values (BUN, CREAT GFR) Demographics (Age, Sex Race) Medications Diuretics Patient in Hospital Stay Pharmacy Claims ACE, Beta blockers Patient Discharge Comorbidities Diabetes, Hypertension Medications ACE Inhibitor Beta Blockers Milrinone Procedures Cardiac Catheterization Mechanical Ventilation Hemodialysis Treatment Period Diagnosis Pre-discharge Period Follow-up PeriodReadmission Period
  • 38. 39 Challenges with Clinical Data Analysis  Integrating Multi-source Data – Data maintained over different sources such as procedures, medications, labs, demographics etc.  Data Cleaning – Many labs and procedures have missing values.  Non-uniformity in Columns – Certain lab values are measured in meq/DL and mmol/DL. – Solution: Apply the logarithmic transformation to standardize and normalize the data.  Multiple instances for unique patients – Several records from different data sources are available for the same hospitalization of a patient. – Solution: Obtain summary statistics such as minimum, maximum and average for aggregating multiple instances into a single one.
  • 39. 40 Snapshot of EHR Data ptID LABName LABval Low High Time Date 1665 BUN 24 10 25 12:00 07/01/2007 1665 BUN 28 10 25 12:30 07/01/2007 1665 BUN 32 10 25 09:00 07/02/2007 1665 BUN 26 10 25 18:00 07/02/2007 1665 BUN 30 10 25 09:00 07/03/2007 LABS DATA ptID Age Sex Race 1665 57 M African American DEMOGRAPHICS ptID Log(BMAX) Log(BMIN) Log(BAVG) BUNCOUNT ALR 1665 1.5 1.38 1.44 5 0.8 ALR – Abnormal Labs Ratio BMAX/BMIN/BAVG- BUN maximum/minimum/average CLINICAL FEATURE TRANSFORMATION
  • 40. 41 Snapshot of EHR Data ptID Procedures Medications Time Date 1665 Cardiac cath ACE inhibitor 12:00 07/01/2007 1665 Mech vent ACE inhibitor 12:30 07/01/2007 1665 Cardiac cath Beta blocker 09:00 07/02/2007 1665 Cardiac cath ACE inhibitor 18:00 07/02/2007 1665 Mech vent Beta blocker 09:00 07/03/2007 PROCEDURES AND MEDICATIONS ptID ACE inhib Beta block Cardiac cath Mech vent 1665 3 2 3 2 CLINICAL FEATURE TRANSFORMATION
  • 41. 42 Medications Laboratory Reports Demographics Comorbidities Procedures Training Data from the Hospital Data Warehouse Data Integration Selected Data Selected Data Feature Transformation Discharge Time Off-Line Processing Laboratory Reports Medications Demographics Comorbidities Procedures Patient records in Hospital Predictive Analytics Data Cleaning Evaluation Take Action
  • 42. 43  Staying back in hospital  Home nurse visitation  Telephone support  Automated reminder RISK COST PredictiveAnalytics LOW HIGH Interventions – Depending on the Risk scores at the Discharge HOSPITAL RESOURCE UTILIZATION THE COST SPENT FOR PATIENT CARE IS DETERMINED BY THE RISK
  • 43. 44 AUC Performance Comparisons Heart Failure Patient data collected at Henry Ford Hospital between 2000-2009.
  • 44. 45 Biomarker Identification Ross, Joseph S., et al. "Statistical models and patient predictors of readmission for heart failure: a systematic review." Archives of internal medicine 168.13 (2008): 1371.
  • 45. 46 Top Biomarkers obtained from Regularized COX Biomarkers Association Values HGB Negative HGB < 12.3 mg/DL BUN High Positive BUN > 40 mg/DL CREAT High Positive CRET > 1.21 mg/DL GFR High Negative GFR < 48.61 mg/DL PHOS Positive PHOS > 3.9 meq/DL K Positive K > 3.7 meq/DL MG Positive MG > 1.2 meq/DL LDL Negative LDL < 0.6 mg/DL Already known and well-studied Biomarkers Newly found Biomarkers- Clinicians showing some interest
  • 46. 47 Conclusions and Future of our Study  The performance of regularized Cox algorithms is better than that of simple Cox regression and other standard predictive algorithms.  Biomarkers selected through this method are clinically relevant.  One advantage of Cox models is that there is no re-training needed if we change the time of interest (from 30 days to 90 days).  Currently working on temporal modeling using survival analysis.  Adding claims data for a partial set of patients.