2. OUTLINE
In this presentation, we are going to discuss about,
1 Introduction
2 Project Goal
3 Literature Review
4 Existing Approach the of Paper
5 Proposed Approach
6 Technology using
7 Publicly available Dataset
8 Existing Paper Results
9 Conclusion
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
4. Introduction
Cardiovascular diseases (CVDs) take the lives of 17.7 million people
every year, 31 Percent of all global deaths.
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
5. Introduction
Cardiovascular diseases (CVDs) take the lives of 17.7 million people
every year, 31 Percent of all global deaths.
There’s a song by Peter Gabriel called The Power of Your Heart, and
it fits perfectly with my project theme.
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
6. Introduction
Cardiovascular diseases (CVDs) take the lives of 17.7 million people
every year, 31 Percent of all global deaths.
There’s a song by Peter Gabriel called The Power of Your Heart, and
it fits perfectly with my project theme.
This is a third of all deaths on the planet and half of all
non-communicable-disease-related deaths.
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
7. Project Goal
The main Goal of this Project is to build a model
that can predict the heart disease occurrence,
based on a combination of features (risk factors)
describing the disease. Different machine learning
classification techniques will be implemented and
compared upon standard performance metric such
as accuracy.
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
9. Literature Review
Sellappan Palaniappan et al proposed a model that was built with the aid
of data mining techniques like Na¨ıve Bayes,Decision Trees, Logistic
Regression, SVM and Random Forests was proposed. It facilitated the
establishment of vital knowledge, e.g. patterns, relationships among
variables amid medical factors connected with heart disease.
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
10. Literature Review
Table shows different data mining techniques used in the diagnosis of
Heart disease over different Heart disease datasets.
Author Year Technique Used attributes
Carlos et al 2001 association rules 25
Dr. K. Usha Rani 2011 Classification(NN) 13
Jesmin Nahar ,et al 2013 Predictive Apriori,Tertius 14
Ms. Ishtake et al. 2013 D-Tree,NN,Naive Bayes 15
Shadab et al 2012 Naive bayes 15
Shantakumar et al 2009 MAFIA,Clustering,K-Means 13
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
11. Existing Approach of the paper you are referring
Existing Approach of the paper you are referring
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
12. Existing Approach of the paper you are referring
A. Classification: - Classification is a classic data mining technique based
on machine learning. Basically classification is used to classify each item in
a set of data into one of predefined set of classes or groups. Classification
method makes use of mathematical techniques such as decision trees,
linear programming, neural network and statistics.
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
13. Existing Approach of the paper you are referring
B. Clustering:-Clustering is a data mining technique that makes
meaningful or useful cluster of objects that have similar characteristic using
automatic technique. Different from classification, clustering technique
also defines the classes and put objects in them, while in classification
objects are assigned into predefined classes. For example In prediction of
heart disease by using clustering we get cluster or we can say that list of
patients which have same risk factor. Means this makes the separate list
of patients with high blood sugar and related risk factor n so on.
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
14. Existing Approach of the paper you are referring
C. Prediction: - The prediction as it name implied is one of a data mining
techniques that discovers relationship between independent variables and
relationship between dependent and independent variables. For instance,
prediction analysis technique can be used in sale to predict profit for the
future if we consider sale is an independent variable, profit could be a
dependent variable. Then based on the historical sale and profit data, we
can draw a fitted regression curve that is used for profit prediction.
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
16. Proposed Approach
K-Nearest Neighbours (KNN) : K-Nearest Neighbors
algorithm is a non-parametric method used for classification and
regression. The principle behind nearest neighbour methods is to find a
predefined number of training samples closest in distance to the new point
and predict the label from these.
Decision Trees : DT algorithm creates a model that predicts the
value of a target variable by learning simple decision rules inferred from
the data features. It is simple to understand and interpret and it’s possible
to visualize how important a particular feature was for our tree.
Logistic Regression : Logistic regression is a basic technique in
statistical analysis that attempts to predict a data value based on prior
observations. A logistic regression algorithm looks at the relationship
between a dependent variable and one or more dependent variables.
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
17. Proposed Approach
Gaussian Naive Bayes : In machine learning, naive Bayes
classifiers are a family of simple probabilistic classifiers based on applying
Bayes’ theorem with strong (naive) independence assumptions between the
features.
Support Vector Machines : Support Vector Machines are
perhaps one of the most popular machine learning algorithms. They are
the go-to method for a high-performing algorithm with a little tuning. At
first, let’s try it on default settings.
Random Forests : Random forests are an ensemble learning
method for classification, regression and other tasks, that operate by
constructing a multitude of decision trees at training time and outputting
the class that is the mode of the classes (classification) or mean prediction
(regression) of the individual trees.
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
19. Technology using
I am Using AWS Spark: Spark 2.3.1 on Hadoop 2.8.4 YARN with Ganglia
3.7.2 and Zeppelin 0.7.3, Software configuration to work on.
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
21. Publicly available Dataset
The dataset used in this project contains 14 variables. The independent
variable that needs to be predicted, ’diagnosis’, determines whether a
person is healthy or suffer from heart disease. Experiments with the
Cleveland database have concentrated on endeavours to distinguish disease
presence (values 1, 2, 3, 4) from absence (value 0). There are several
missing attribute values, distinguished with symbol ’?’. The header row is
missing in this dataset, so the column names have to be inserted manually.
DataSet Available at
”http://archive.ics.uci.edu/ml/datasets/Heart+Disease”
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
22. Publicly available Dataset
Features information:
age - age in years
sex - sex(1 = male; 0 = female)
chest pain - chest pain type (1 = typical angina;atypical angina; 3 =
non-anginal pain; 4 = asymptomatic)
blood pressure - resting blood pressure (in mm Hg on admission to
the hospital)
serum cholestoral - serum cholestoral in mg/dl
fasting blood sugar - fasting blood sugar ¿ 120 mg/dl (1 = true; 0 =
false)
electrocardiographic - resting electrocardiographic results (0 =
normal; 1 = having ST-T; 2 = hypertrophy)
max heart rate - maximum heart rate achieved induced angina -
exercise induced angina (1 = yes; 0 = no)
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
23. Publicly available Dataset
ST depression - ST depression induced by exercise relative to rest
slope - the slope of the peak exercise ST segment (1 = upsloping; 2
= flat; 3 = downsloping)
no of vessels - number of major vessels (0-3) colored by flourosopy
thal - 3 = normal; 6 = fixed defect; 7 = reversable defect diagnosis -
the predicted attribute - diagnosis of heart disease (angiographic
disease status) (Value 0 = ¡ 50percent diameter narrowing; Value 1 =
¿ 50percent diameter narrowing)
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
24. Existing Paper Results on Dataset
Existing Paper Results on Dataset
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
25. Existing Paper Results on Dataset
The algorithm to mine association rules uses several important constraints
to reduce the number of rules and speed up the mining process. It uses a
constraint to exclude combinations of attributes eliminating trivial or
useless associations. Certain attributes are constrained to appear only in
the antecedent, only in the consequent or in both to get medically
meaningful rules. Rules are constrained to include a maximum number of
items to make them simpler and more general. Maximum support is a
constraint used to eliminate trivial rules. These constraints allowed us to
mine medical records at a minimum support involving only two
transactions. The experimental section discussed several important
association rules predicting absence or presence of heart disease.
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
26. Conclusion
We have discuss these following topics :
1 Using the Counting Principles
2 Finding the simple way of permutation
3 Solving the Problem of Permutation
4 Finding the simple way of permutation of Multiset
5 Finding the simple way of Combinations
6 Solving the Problem of Combinations
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
27. Conclusion
We have discuss these following topics :
Techniques Accuracy
KNN 0.868132
Decision Trees 0.791209
Logistic Regression 0.857143
Naive Bayes 0.868132
SVM 0.879121
Random Forests 0.890110
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
28. References
Mining Constrained Association Rules to Predict Heart Disease * :
https://doi.org/10.1109/ICDM.2001.989549
Predictive analytics on Electronic Health Records (EHRs) using
Hadoop and Hive :
https://ieeexplore.ieee.org/abstract/document/7226129/
Predictive Methodology for Diabetic Data Analysis in Big Data :
https://doi.org/10.1016/j.procs.2015.04.069
Intelligent and Effective Heart Attack Prediction System UsingData
Mining and Artificial Neural Network :
https://www.scribd.com/document/88906669/Heart-Prediction
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27
29. Saturday, 29 September World Heart Day 2018
GAURAV DUBEY Heart-Disease-Analysis-and-Prediction / 27