Efficient Medical Diagnosis of Human Heart
Diseases Using Machine Learning Techniques
With and Without GridSearchCV
PRESENTED BY: V LALITHA
Authors: Ghulab Nabi Ahmad, Hira Fatima, Shafiullah, Abdelaziz Salah and Imadadullah
IEEE Access
Introduction
• Cardiovascular diseases are a significant global health concern, accounting for a substantial portion of annual
deaths. Early and accurate diagnosis of heart diseases is essential for effective treatment and management.
• Machine learning has emerged as a valuable tool for medical diagnosis, offering the potential for improved
accuracy and efficiency.
• The research paper focuses on utilizing machine learning techniques to enhance the diagnosis of human heart
diseases.
• The primary aim is to improve the accuracy of heart disease diagnosis using a variety of techniques and
hyperparameter optimization.
Model’s Flow Diagram
1. Data Collection:
They are Kaggle's Heart Disease Cleveland, Hungary, Switzerland & Long Beach V dataset, Heart Disease UCI Kaggle dataset.
2. Data Pre-Processing:
Data CleaningData TransformationFeature SelectionData Splitting, Handle Class Imbalance.
3. Data Mining:
It generally refers to the process of extracting valuable insights, patterns, and knowledge from large
datasets, particularly related to human heart diseases.
4. Proposed Model:
The proposed model involves the application of machine learning algorithms like Logistic Regression, k-Nearest Neighbors (K-NN),
Support Vector Machine (SVM), and XGBoost.
Image source: https://ieeexplore.ieee.org/document/9751602/
Dataset Information
Heart Disease Prediction Proposed Model
Image source: https://ieeexplore.ieee.org/document/9751602/
Logistic Regression
Logistic Regression is primarily used for binary classification problems, where you want to predict one of two
possible outcomes.
1. Binary Classification: Logistic Regression is employed when you want to classify data into one of two classes,
such as Yes/No, True/False, Spam/Not Spam, or 1/0.
2. Model: It uses a logistic (sigmoid) function to transform a linear combination of input features into a
probability score between 0 and 1.
3. Probability: The logistic function maps the output to a probability, where values closer to 1 indicate a high
probability of belonging to one class, and values closer to 0 indicate a high probability of belonging to the other
class.
K-Nearest Neighbours (KNN)
•K-Nearest Neighbors (KNN) is a machine learning algorithm for classification and regression.
• It predicts outcomes by comparing a new data point with its k-nearest neighbors from the training data.
•The choice of k is very important in KNN.
Image source: https://ieeexplore.ieee.org/document/9751602/
Support Vector Machine
Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used for classification and
regression tasks.
1. Objective: SVM aims to find the hyperplane that best separates different classes in the feature space while
maximizing the margin between the classes.
2. Margin: The margin is the distance between the hyperplane and the nearest data points from each class. SVM
looks for the hyperplane with the largest margin.
3. Kernel Trick: SVM can handle both linear and non-linear data by using kernel functions to map data into
higher-dimensional spaces.
Extreme Gradient Boosting
Extreme Gradient Boosting, often referred to as XGBoost, is a powerful and popular machine learning algorithm
that belongs to the ensemble learning family, specifically gradient boosting.
1. Ensemble Learning: XGBoost is an ensemble learning technique, which means it combines the predictions of
multiple weaker models (typically decision trees) to create a strong predictive model. This ensemble approach
helps improve predictive accuracy.
2. Gradient Boosting: XGBoost employs a gradient boosting framework, which is a sequential training method. It
builds a series of decision trees where each tree corrects the errors made by the previous one. This leads to a
more accurate and robust model.
GridSearchCV
• Grid Search Cross-Validation (GridSearchCV) is a hyperparameter tuning technique used in machine learning to
systematically search for the best combination of hyperparameter values for a given model.
1. Define the Hyperparameter Grid:
First, you need to specify the hyperparameters that you want to tune and the range of values you want to test.
For each hyperparameter, create a list of possible values to explore. This defines the hyperparameter grid.
2. Choose a Model:
Select the machine learning algorithm you want to use. This could be a classifier or a regressor, depending on
your problem.
GridSearchCV
3. Split the Data:
Divide your dataset into two parts: A training set for model training and a validation set for hyperparameter
tuning. The validation set helps assess the performance of different hyperparameter combinations.
4. Perform Cross-Validation:
GridSearchCV uses k-fold cross-validation to evaluate each combination of hyperparameters. It divides the training
data into k subsets (folds) and iterates through them, using one as the validation set while training on the other k-
1 sets.
Hyperparameters
• The best hyper parameter of XG BOOST with optimization technique for Kaggle’s heart disease Cleveland,
Hungary, Switzerland & long beach V dataset.
• The best hyperparameter of XG BOOST with optimization technique for heart disease UCI Kaggle dataset.
Image source: https://ieeexplore.ieee.org/document/9751602/
With VS Without GridSearchCV
Image source: https://ieeexplore.ieee.org/document/9751602/
Thank You

PPT_ML.pptx______________________________________

  • 1.
    Efficient Medical Diagnosisof Human Heart Diseases Using Machine Learning Techniques With and Without GridSearchCV PRESENTED BY: V LALITHA Authors: Ghulab Nabi Ahmad, Hira Fatima, Shafiullah, Abdelaziz Salah and Imadadullah IEEE Access
  • 2.
    Introduction • Cardiovascular diseasesare a significant global health concern, accounting for a substantial portion of annual deaths. Early and accurate diagnosis of heart diseases is essential for effective treatment and management. • Machine learning has emerged as a valuable tool for medical diagnosis, offering the potential for improved accuracy and efficiency. • The research paper focuses on utilizing machine learning techniques to enhance the diagnosis of human heart diseases. • The primary aim is to improve the accuracy of heart disease diagnosis using a variety of techniques and hyperparameter optimization.
  • 3.
    Model’s Flow Diagram 1.Data Collection: They are Kaggle's Heart Disease Cleveland, Hungary, Switzerland & Long Beach V dataset, Heart Disease UCI Kaggle dataset. 2. Data Pre-Processing: Data CleaningData TransformationFeature SelectionData Splitting, Handle Class Imbalance. 3. Data Mining: It generally refers to the process of extracting valuable insights, patterns, and knowledge from large datasets, particularly related to human heart diseases. 4. Proposed Model: The proposed model involves the application of machine learning algorithms like Logistic Regression, k-Nearest Neighbors (K-NN), Support Vector Machine (SVM), and XGBoost. Image source: https://ieeexplore.ieee.org/document/9751602/
  • 4.
  • 5.
    Heart Disease PredictionProposed Model Image source: https://ieeexplore.ieee.org/document/9751602/
  • 6.
    Logistic Regression Logistic Regressionis primarily used for binary classification problems, where you want to predict one of two possible outcomes. 1. Binary Classification: Logistic Regression is employed when you want to classify data into one of two classes, such as Yes/No, True/False, Spam/Not Spam, or 1/0. 2. Model: It uses a logistic (sigmoid) function to transform a linear combination of input features into a probability score between 0 and 1. 3. Probability: The logistic function maps the output to a probability, where values closer to 1 indicate a high probability of belonging to one class, and values closer to 0 indicate a high probability of belonging to the other class.
  • 7.
    K-Nearest Neighbours (KNN) •K-NearestNeighbors (KNN) is a machine learning algorithm for classification and regression. • It predicts outcomes by comparing a new data point with its k-nearest neighbors from the training data. •The choice of k is very important in KNN. Image source: https://ieeexplore.ieee.org/document/9751602/
  • 8.
    Support Vector Machine SupportVector Machine (SVM) is a powerful supervised machine learning algorithm used for classification and regression tasks. 1. Objective: SVM aims to find the hyperplane that best separates different classes in the feature space while maximizing the margin between the classes. 2. Margin: The margin is the distance between the hyperplane and the nearest data points from each class. SVM looks for the hyperplane with the largest margin. 3. Kernel Trick: SVM can handle both linear and non-linear data by using kernel functions to map data into higher-dimensional spaces.
  • 9.
    Extreme Gradient Boosting ExtremeGradient Boosting, often referred to as XGBoost, is a powerful and popular machine learning algorithm that belongs to the ensemble learning family, specifically gradient boosting. 1. Ensemble Learning: XGBoost is an ensemble learning technique, which means it combines the predictions of multiple weaker models (typically decision trees) to create a strong predictive model. This ensemble approach helps improve predictive accuracy. 2. Gradient Boosting: XGBoost employs a gradient boosting framework, which is a sequential training method. It builds a series of decision trees where each tree corrects the errors made by the previous one. This leads to a more accurate and robust model.
  • 10.
    GridSearchCV • Grid SearchCross-Validation (GridSearchCV) is a hyperparameter tuning technique used in machine learning to systematically search for the best combination of hyperparameter values for a given model. 1. Define the Hyperparameter Grid: First, you need to specify the hyperparameters that you want to tune and the range of values you want to test. For each hyperparameter, create a list of possible values to explore. This defines the hyperparameter grid. 2. Choose a Model: Select the machine learning algorithm you want to use. This could be a classifier or a regressor, depending on your problem.
  • 11.
    GridSearchCV 3. Split theData: Divide your dataset into two parts: A training set for model training and a validation set for hyperparameter tuning. The validation set helps assess the performance of different hyperparameter combinations. 4. Perform Cross-Validation: GridSearchCV uses k-fold cross-validation to evaluate each combination of hyperparameters. It divides the training data into k subsets (folds) and iterates through them, using one as the validation set while training on the other k- 1 sets.
  • 12.
    Hyperparameters • The besthyper parameter of XG BOOST with optimization technique for Kaggle’s heart disease Cleveland, Hungary, Switzerland & long beach V dataset. • The best hyperparameter of XG BOOST with optimization technique for heart disease UCI Kaggle dataset. Image source: https://ieeexplore.ieee.org/document/9751602/
  • 13.
    With VS WithoutGridSearchCV Image source: https://ieeexplore.ieee.org/document/9751602/
  • 14.