SlideShare a Scribd company logo
Heart disease classification
Project- 2
Presented by-
Sneha Dey
Dept of CSE, B.tech, 3rd year
(Seacom Engineering College)
ABOUT THE PROJECT
Machine Learning is used across many spheres around the world. The healthcare industry is no exception. Machine
Learning can play an essential role in predicting presence/absence of Locomotor disorders, Heart diseases and more. Such
information, if predicted well in advance, can provide important insights to doctors who can then adapt their diagnosis and
treatment per patient basis.
In this project, I’ll discuss where I worked on predicting potential Heart Diseases in people using Machine Learning
algorithms. The algorithms included K Neighbors Classifier and Support Vector Classifier and applying PCA on
these 2 models.
Objective
Improve cardiovascular health and quality of life through prevention, detection, and treatment.
Early identification and treatment of heart attacks and strokes.
Prevention of repeat cardiovascular events and reduction in deaths from cardiovascular disease.
IMPORTING LIBRARIES AND DATASET
I imported several libraries for the project and all the necessary Machine Learning algorithms.
In the dataset,there are a total of 13 features and 1 target variable. Also, there are no missing values so we don’t need to take
care of any null values. Next, I used describe() method.
dataset.describe()
The method revealed
that the range of each
variable is different. The
maximum value of age is
77 but for chol it is 564.
Thus, feature scaling
must be performed on
the dataset.
Confusion Matrix
To begin with, let’s see the correlation matrix of features and try to analyse it. The figure size is defined to 12 x 8 by using
rcParams. Then, I used pyplot to show the correlation matrix. Using xticks and yticks, I’ve added names to the
correlation matrix. colorbar() shows the colorbar for the matrix.
It’s easy to see that there is no single feature
that has a very high correlation with our target
value. Also, some of the features have a
negative correlation with the target value and
some have positive.
Histogram
The best part about this type of plot is that it just takes a single command to draw the plots and it provides so much
information in return. Just use dataset.hist().
Let’s take a look at the plots. It shows
how each feature and label is
distributed along different ranges,
which further confirms the need for
scaling. Next, wherever you see discrete
bars, it basically means that each of
these is actually a categorical variable.
We will need to handle these
categorical variables before applying
Machine Learning. Our target labels
have two classes, 0 for no disease and 1
for disease.
Predict the Target class
It’s really essential that the dataset we are working on should be approximately balanced. An extremely imbalanced dataset
can render the whole model training useless and thus, will be of no use.
For x-axis I used the unique() values from the target column and
then set their name using xticks. For y-axis, I used value_count()
to get the values for each class. I colored the bars as green and red.
From the plot, we can see that the classes are almost balanced and we
are good to proceed with data processing.
Bar plot for Count of male and female
As per dataset,
0 – female and 1 – male
The plot shows the number of male is
mostly 2X greater than female patients in
the active study.
Scatter plot between Age and Maximum heart rate
A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two
variables- Age and Maximum Heart Rate.
The plot shows the ages between 40 to 70 is more
effective patients due to Maximum Heart Rate.
Data Processing and Splitting
To work with categorical variables, we should break each categorical column into dummy columns with 1s and 0s.
In this project, I took 4 algorithms and varied their various parameters and compared the final models.
I split the dataset into 67% training data and 33% testing data.
Support Vector Machine
I will run the model on the train and test set using SVC( Support Vector Classifier) and then use the test set to see what kind
of prediction results we get using the test data set.This classifier aims at forming a hyperplane that can separate the classes
as much as possible by adjusting the distance between the data points and the hyperplane. There are several kernels based
on which the hyperplane is decided. I tried four kernels namely, linear, poly, rbf, and sigmoid.
Once I had the scores for each, I used the rainbow method to select different colors for each bar and plot a bar graph of the
scores achieved by each.
As can be seen from the plot above, the linear
kernel performed the best for this dataset and
achieved a score of 83%
K – nearest neighbor classifier
This classifier looks for the classes of K nearest neighbors of a given data point and based on the majority class, it assigns a
class to this data point. However, the number of neighbors can be varied. I varied them from 1 to 20 neighbors and
calculated the test score in each case.Then, I plot a line graph of the number of neighbors and the test score achieved in each
case.
As you can see, we achieved the maximum score of 87%
when the number of neighbors was chosen to be 8.
SVM with PCA
The PCA/SVM-based method involves PCA-based data selection and image feature extraction for SVM classification; this
method can be used to solve the detection problems inherent in imprecise, uncertain, and incoherent data from multiple
sensors.
As you can see, we achieved the maximum score
of 80% the best for this dataset as per accuracy
check.
KNN with PCA
I am using KNN to classify and I also now have implemented PCA to reduce the dimensionality.As one closest neighbor is
red circle. Basically, with lower K value, - KNN model is trying to fit the model to data very closely and trying to find.
As you can see, we achieved the maximum
score of 56% when the number of neighbors
was chosen to be 20.
ACCURACY
1. Support Vector Classifier: 83%
2. K Neighbours Classifier: 87%
3. SVM with PCA: 80%
4. KNN with PCA: 56%
K Neighbours Classifier scored the best score of 87%
inference
K-Neighbours Classifier is amongst the most popular learning method grouped by similarities to build
machine learning models particularly for Heart disease prediction and document classification.
THANK YOU

More Related Content

What's hot

Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
MachinePulse
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
DataminingTools Inc
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
ClaireWhittaker5
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
Learnbay Datascience
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
Vimal Gupta
 
Housing price prediction
Housing price predictionHousing price prediction
Housing price prediction
Abhimanyu Dwivedi
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
Alia Hamwi
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
Joel Graff
 
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetData Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
IJCERT
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Rebecca Bilbro
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovarianceShrey Nishchal
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 
WEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And TechniquesWEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And Techniques
DataminingTools Inc
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
Gautam Kumar
 
Selecting the Right Type of Algorithm for Various Applications - Phdassistance
Selecting the Right Type of Algorithm for Various Applications - PhdassistanceSelecting the Right Type of Algorithm for Various Applications - Phdassistance
Selecting the Right Type of Algorithm for Various Applications - Phdassistance
PhD Assistance
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
QuantUniversity
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 

What's hot (19)

Pca analysis
Pca analysisPca analysis
Pca analysis
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
How to understand and implement regression analysis
How to understand and implement regression analysisHow to understand and implement regression analysis
How to understand and implement regression analysis
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
 
A tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbiesA tour of the top 10 algorithms for machine learning newbies
A tour of the top 10 algorithms for machine learning newbies
 
Housing price prediction
Housing price predictionHousing price prediction
Housing price prediction
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data SetData Trend Analysis by Assigning Polynomial Function For Given Data Set
Data Trend Analysis by Assigning Polynomial Function For Given Data Set
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
 
Slides distancecovariance
Slides distancecovarianceSlides distancecovariance
Slides distancecovariance
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
WEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And TechniquesWEKA: Practical Machine Learning Tools And Techniques
WEKA: Practical Machine Learning Tools And Techniques
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
 
Selecting the Right Type of Algorithm for Various Applications - Phdassistance
Selecting the Right Type of Algorithm for Various Applications - PhdassistanceSelecting the Right Type of Algorithm for Various Applications - Phdassistance
Selecting the Right Type of Algorithm for Various Applications - Phdassistance
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
 
Random forest
Random forestRandom forest
Random forest
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 

Similar to Heart disease classification

Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Seval Çapraz
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
BeyaNasr1
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
Dinusha Dilanka
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
Piyush Srivastava
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
Boston Institute of Analytics
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
ijaia
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionDario Panada
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakash
Shivaram Prakash
 
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptxKNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
Nishant83346
 
PythonML.pptx
PythonML.pptxPythonML.pptx
PythonML.pptx
Hussain395748
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
rajalakshmi5921
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
Ashish Patel
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Boston Institute of Analytics
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
Ashish Salve
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
Sai Kiran Kadam
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
IRJET Journal
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
NitinSharma134320
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Peea Bal Chakraborty
 
ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
Shiwani Gupta
 

Similar to Heart disease classification (20)

Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
Parameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point DetectionParameter Optimisation for Automated Feature Point Detection
Parameter Optimisation for Automated Feature Point Detection
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakash
 
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptxKNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
KNN CLASSIFIER, INTRODUCTION TO K-NEAREST NEIGHBOR ALGORITHM.pptx
 
PythonML.pptx
PythonML.pptxPythonML.pptx
PythonML.pptx
 
Singular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptxSingular Value Decomposition (SVD).pptx
Singular Value Decomposition (SVD).pptx
 
EDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptxEDAB Module 5 Singular Value Decomposition (SVD).pptx
EDAB Module 5 Singular Value Decomposition (SVD).pptx
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
IRJET- Disease Prediction using Machine Learning
IRJET-  Disease Prediction using Machine LearningIRJET-  Disease Prediction using Machine Learning
IRJET- Disease Prediction using Machine Learning
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
ML MODULE 2.pdf
ML MODULE 2.pdfML MODULE 2.pdf
ML MODULE 2.pdf
 

Recently uploaded

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 

Recently uploaded (20)

Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 

Heart disease classification

  • 1. Heart disease classification Project- 2 Presented by- Sneha Dey Dept of CSE, B.tech, 3rd year (Seacom Engineering College)
  • 2. ABOUT THE PROJECT Machine Learning is used across many spheres around the world. The healthcare industry is no exception. Machine Learning can play an essential role in predicting presence/absence of Locomotor disorders, Heart diseases and more. Such information, if predicted well in advance, can provide important insights to doctors who can then adapt their diagnosis and treatment per patient basis. In this project, I’ll discuss where I worked on predicting potential Heart Diseases in people using Machine Learning algorithms. The algorithms included K Neighbors Classifier and Support Vector Classifier and applying PCA on these 2 models. Objective Improve cardiovascular health and quality of life through prevention, detection, and treatment. Early identification and treatment of heart attacks and strokes. Prevention of repeat cardiovascular events and reduction in deaths from cardiovascular disease.
  • 3. IMPORTING LIBRARIES AND DATASET I imported several libraries for the project and all the necessary Machine Learning algorithms. In the dataset,there are a total of 13 features and 1 target variable. Also, there are no missing values so we don’t need to take care of any null values. Next, I used describe() method. dataset.describe() The method revealed that the range of each variable is different. The maximum value of age is 77 but for chol it is 564. Thus, feature scaling must be performed on the dataset.
  • 4. Confusion Matrix To begin with, let’s see the correlation matrix of features and try to analyse it. The figure size is defined to 12 x 8 by using rcParams. Then, I used pyplot to show the correlation matrix. Using xticks and yticks, I’ve added names to the correlation matrix. colorbar() shows the colorbar for the matrix. It’s easy to see that there is no single feature that has a very high correlation with our target value. Also, some of the features have a negative correlation with the target value and some have positive.
  • 5. Histogram The best part about this type of plot is that it just takes a single command to draw the plots and it provides so much information in return. Just use dataset.hist(). Let’s take a look at the plots. It shows how each feature and label is distributed along different ranges, which further confirms the need for scaling. Next, wherever you see discrete bars, it basically means that each of these is actually a categorical variable. We will need to handle these categorical variables before applying Machine Learning. Our target labels have two classes, 0 for no disease and 1 for disease.
  • 6. Predict the Target class It’s really essential that the dataset we are working on should be approximately balanced. An extremely imbalanced dataset can render the whole model training useless and thus, will be of no use. For x-axis I used the unique() values from the target column and then set their name using xticks. For y-axis, I used value_count() to get the values for each class. I colored the bars as green and red. From the plot, we can see that the classes are almost balanced and we are good to proceed with data processing.
  • 7. Bar plot for Count of male and female As per dataset, 0 – female and 1 – male The plot shows the number of male is mostly 2X greater than female patients in the active study.
  • 8. Scatter plot between Age and Maximum heart rate A scatter plot is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables- Age and Maximum Heart Rate. The plot shows the ages between 40 to 70 is more effective patients due to Maximum Heart Rate.
  • 9. Data Processing and Splitting To work with categorical variables, we should break each categorical column into dummy columns with 1s and 0s. In this project, I took 4 algorithms and varied their various parameters and compared the final models. I split the dataset into 67% training data and 33% testing data.
  • 10. Support Vector Machine I will run the model on the train and test set using SVC( Support Vector Classifier) and then use the test set to see what kind of prediction results we get using the test data set.This classifier aims at forming a hyperplane that can separate the classes as much as possible by adjusting the distance between the data points and the hyperplane. There are several kernels based on which the hyperplane is decided. I tried four kernels namely, linear, poly, rbf, and sigmoid. Once I had the scores for each, I used the rainbow method to select different colors for each bar and plot a bar graph of the scores achieved by each. As can be seen from the plot above, the linear kernel performed the best for this dataset and achieved a score of 83%
  • 11. K – nearest neighbor classifier This classifier looks for the classes of K nearest neighbors of a given data point and based on the majority class, it assigns a class to this data point. However, the number of neighbors can be varied. I varied them from 1 to 20 neighbors and calculated the test score in each case.Then, I plot a line graph of the number of neighbors and the test score achieved in each case. As you can see, we achieved the maximum score of 87% when the number of neighbors was chosen to be 8.
  • 12. SVM with PCA The PCA/SVM-based method involves PCA-based data selection and image feature extraction for SVM classification; this method can be used to solve the detection problems inherent in imprecise, uncertain, and incoherent data from multiple sensors. As you can see, we achieved the maximum score of 80% the best for this dataset as per accuracy check.
  • 13. KNN with PCA I am using KNN to classify and I also now have implemented PCA to reduce the dimensionality.As one closest neighbor is red circle. Basically, with lower K value, - KNN model is trying to fit the model to data very closely and trying to find. As you can see, we achieved the maximum score of 56% when the number of neighbors was chosen to be 20.
  • 14. ACCURACY 1. Support Vector Classifier: 83% 2. K Neighbours Classifier: 87% 3. SVM with PCA: 80% 4. KNN with PCA: 56% K Neighbours Classifier scored the best score of 87%
  • 15. inference K-Neighbours Classifier is amongst the most popular learning method grouped by similarities to build machine learning models particularly for Heart disease prediction and document classification.