PPT_ML.pptx______________________________________

Efficient Medical Diagnosis of Human Heart
Diseases Using Machine Learning Techniques
With and Without GridSearchCV
PRESENTED BY: V LALITHA
Authors: Ghulab Nabi Ahmad, Hira Fatima, Shafiullah, Abdelaziz Salah and Imadadullah
IEEE Access

Introduction
• Cardiovascular diseases are a significant global health concern, accounting for a substantial portion of annual
deaths. Early and accurate diagnosis of heart diseases is essential for effective treatment and management.
• Machine learning has emerged as a valuable tool for medical diagnosis, offering the potential for improved
accuracy and efficiency.
• The research paper focuses on utilizing machine learning techniques to enhance the diagnosis of human heart
diseases.
• The primary aim is to improve the accuracy of heart disease diagnosis using a variety of techniques and
hyperparameter optimization.

Model’s Flow Diagram
1. Data Collection:
They are Kaggle's Heart Disease Cleveland, Hungary, Switzerland & Long Beach V dataset, Heart Disease UCI Kaggle dataset.
2. Data Pre-Processing:
Data CleaningData TransformationFeature SelectionData Splitting, Handle Class Imbalance.
3. Data Mining:
It generally refers to the process of extracting valuable insights, patterns, and knowledge from large
datasets, particularly related to human heart diseases.
4. Proposed Model:
The proposed model involves the application of machine learning algorithms like Logistic Regression, k-Nearest Neighbors (K-NN),
Support Vector Machine (SVM), and XGBoost.
Image source: https://ieeexplore.ieee.org/document/9751602/

Heart Disease Prediction Proposed Model

Logistic Regression
Logistic Regression is primarily used for binary classification problems, where you want to predict one of two
possible outcomes.
1. Binary Classification: Logistic Regression is employed when you want to classify data into one of two classes,
such as Yes/No, True/False, Spam/Not Spam, or 1/0.
2. Model: It uses a logistic (sigmoid) function to transform a linear combination of input features into a
probability score between 0 and 1.
3. Probability: The logistic function maps the output to a probability, where values closer to 1 indicate a high
probability of belonging to one class, and values closer to 0 indicate a high probability of belonging to the other
class.

K-Nearest Neighbours (KNN)
•K-Nearest Neighbors (KNN) is a machine learning algorithm for classification and regression.
• It predicts outcomes by comparing a new data point with its k-nearest neighbors from the training data.
•The choice of k is very important in KNN.

Support Vector Machine
Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used for classification and
regression tasks.
1. Objective: SVM aims to find the hyperplane that best separates different classes in the feature space while
maximizing the margin between the classes.
2. Margin: The margin is the distance between the hyperplane and the nearest data points from each class. SVM
looks for the hyperplane with the largest margin.
3. Kernel Trick: SVM can handle both linear and non-linear data by using kernel functions to map data into
higher-dimensional spaces.

Extreme Gradient Boosting
Extreme Gradient Boosting, often referred to as XGBoost, is a powerful and popular machine learning algorithm
that belongs to the ensemble learning family, specifically gradient boosting.
1. Ensemble Learning: XGBoost is an ensemble learning technique, which means it combines the predictions of
multiple weaker models (typically decision trees) to create a strong predictive model. This ensemble approach
helps improve predictive accuracy.
2. Gradient Boosting: XGBoost employs a gradient boosting framework, which is a sequential training method. It
builds a series of decision trees where each tree corrects the errors made by the previous one. This leads to a
more accurate and robust model.

GridSearchCV
• Grid Search Cross-Validation (GridSearchCV) is a hyperparameter tuning technique used in machine learning to
systematically search for the best combination of hyperparameter values for a given model.
1. Define the Hyperparameter Grid:
First, you need to specify the hyperparameters that you want to tune and the range of values you want to test.
For each hyperparameter, create a list of possible values to explore. This defines the hyperparameter grid.
2. Choose a Model:
Select the machine learning algorithm you want to use. This could be a classifier or a regressor, depending on
your problem.

GridSearchCV
3. Split the Data:
Divide your dataset into two parts: A training set for model training and a validation set for hyperparameter
tuning. The validation set helps assess the performance of different hyperparameter combinations.
4. Perform Cross-Validation:
GridSearchCV uses k-fold cross-validation to evaluate each combination of hyperparameters. It divides the training
data into k subsets (folds) and iterates through them, using one as the validation set while training on the other k-
1 sets.

Hyperparameters
• The best hyper parameter of XG BOOST with optimization technique for Kaggle’s heart disease Cleveland,
Hungary, Switzerland & long beach V dataset.
• The best hyperparameter of XG BOOST with optimization technique for heart disease UCI Kaggle dataset.

With VS Without GridSearchCV

PPT_ML.pptx______________________________________

More Related Content

Similar to PPT_ML.pptx______________________________________

Recently uploaded

PPT_ML.pptx______________________________________