final.pptx

BRAIN STROKE PREDICTION
USING MACHINE LEARNING
TECHNIQUES
BY
NAME : S.RAJAYOGHA
BRANCH : M.SC DATA SCIENCE
REVIEW : FINAL REVIEW
GUIDE NAME : DR.K.SATHESH KUMAR
REVIEW DATE : 08/05/2023
1

• ABSTRACT
• INTRODUCTION
• SYMPTOMS
• EXISTING SYSTEM
• PROPOSED SYSTEM
• DATASET MODULES
• ARCHITECTURE
CONTENT(contd.,)
2

CONTENT(contd.,)
• DIAGRAMS
• ALGORITHMS
• EXPECTED OUTCOMES
• RESULT
• CONCLUSION
• REFERENCE
• PUBLICATION
3

ABSTRACT
• Stroke is a destructive illness that typically influences individuals over
the age of 65 years age.
• Prediction of stroke is a time consuming and tedious for doctors.
• Five different algorithms are used and a comparison is made for
better accuracy.
• Aim is to create an application with a user friendly interface which is
easy to navigate and enter inputs.
4

INTRODUCTION
• A stroke is a life-threatening condition that happens when part of your
brain doesn't have enough blood flow.
• An ischemic stroke is caused by a blockage cutting off the blood
supply to the brain. This is the most common type of stroke.
• A hemorrhagic stroke is caused by bleeding in or around the brain.
• A transient ischemic attack or TIA is also known as a mini-stroke.
5

INTRODUCTION(contd.,)
• Hemorrhagic strokes are particularly dangerous because they cause
severe symptoms that get worse quickly.
• The Stages of stroke are Stage 1: Flaccidity(soft). Stage 2: Spasticity
(clumsy neural problem). Stage 3: Increased Spasticity. Stage 4:
Decreased Spasticity
• Foods high in potassium such as sweet and white potatoes, bananas,
tomatoes, prunes, melon, and soybeans, can help you maintain healthy
blood pressure — the leading risk factor of stroke.
6

EXISTING SYSTEM
• In recent times, the stress levels in individuals are at an all time high.
This increases the chances of strokes in individuals.
• About 3.0 million deaths resulted from ischemic stroke while 3.3
million deaths resulted from hemorrhagic stroke. Hence, correct detection
and finding presence of stroke inside a human becomes essential.
• In existing system, there are various medical instruments available in the
market for predicting brain stroke but they are very much expensive and
they are not efficient enough to be able to calculate the chance of having
a brain stroke.
8

DISADVANTAGE OF EXISTING SYSTEM
• Takes a lot of time to find the disease.
• Inaccuracy and inefficiency of results.
• This can leads to incomplete datacollection.
• It was not safe for private data collections in hospital.
• They may not be universally accessible or adopted by all hospital
providers or institution
9

PROPOSED SYSTEM
• Artificial Intelligence that contributes various algorithms and
many more which is effective in making decisions and predictions
from the large quantity of data produced by the healthcare industry.
• Based on the proposed problem, ML provides different
classification algorithms to divine the probability of a patient having
a Brain Stroke.
• In proposed system, there are various medical instruments
available in the market for predicting brain stroke but they are very
flexible cost and they are efficient enough to be able to calculate the
chance of predict a brain stroke. 10

ADVANTAGE OF PROPOSED SYSTEM
• It detects the brain stroke disease less time.
• More accuracy and efficiency.
• It was very safe for private data collections in hospital.
• They can universally accessible or adopted by all hospital providers
or institution
11

Frontend and backend
I’m Using Python as the front and MySQL as the backend in a
healthcare data stroke, project can provide several benefits:
• 1. Python is a popular programming language for data analysis and
visualization, which can be useful in analyzing stroke data.
• 2. MySQL can handle large amounts of data and can be easily scaled to
meet the needs of the project.
• 3. The combination of Python and MySQL can provide seamless
integration between the front end and back end, making it easier to
manage and analyze data.
13

Frontend and backend
• 4. Python has a wide range of libraries and frameworks that can be used
to build interactive and user-friendly interfaces for the project.
• 5. MySQL is known for its reliability and stability, which is crucial in a
healthcare project where the accuracy and consistency of data are critical.
• 6.Overall, using Python for the front end and MySQL for the back end
in a healthcare data stroke project can provide a powerful and efficient
solution for managing and analyzing healthcare data.
14

MODULES
From the Kaggle website, https://www.kaggle.com/healthcare-dataset-
stroke-data.
15

MODULES
Balancing Dataset:
• There were 5110 rows and 12 columns in this dataset.
• The value of the output column stroke is either 1 or 0. The number
0 indicates that no stroke risk was identified, while the value 1
indicates that a stroke risk was detected.
• The probability of 0 in the output column (stroke) exceeds the
possibility of 1 in the same column in this dataset. In this Dataset 0
and 1 values are given below. values Stroke/not a stroke
O values Not a stroke
1 values stroke
16

MODULES
Preprocessing
 Before building a model, data preprocessing is required to remove unwanted
noise and outliers from the dataset that could lead the model to depart from its
intended training.
 This stage addresses everything that prevents the model from functioning more
efficiently. Following the collection of the relevant dataset, the data must be
cleaned and prepared for model development. As stated before, the dataset used has
twelve characteristics.
 To improve accuracy, data preprocessing is used to balance the data.It contains
the total number of stroke and non stroke records in the output column before
preprocessing.
17

MODULES
• Correlation Matrix:
In the above heatmap, For Data Analyzing ,we can see that
there is no multicollinearity present and ‘Age’ and ‘Glucose
Level’ are some of the highest correlated features with ‘Stroke’.
• Best Features using Chi-Square Test:
In the above table, we can see that Age, Average Glucose Level
and Hypertension are the top 3 features having maximum impact
on output ‘Stroke’. Chi Square Test is used to find out this result.
18

Correlation matrix
FIG1 : CORRELATION MATRIX
19

Correlation matrix
FIG2:DATA VISUALIZATION OF PARAMETER IN
CORELATION
20

MODULES
Evaluation Matrix
 The confusion matrix is a tool for evaluating the performance of
machine learning classification algorithms. The confusion matrix has been
used to test the efficiency of all models created. The confusion matrix
illustrates how often our models forecast correctly and how often they
estimate incorrectly.
 False positives and false negatives have been allocated to badly
predicted values, whereas true positives and true negatives were assigned
to properly anticipated values. The model’s accuracy, precision-recall trade-
off, and AUC were utilized to assess its performance after grouping all
predicted values in the matrix.
21

ARCHITECTURAL DESIGN FOR PROPOSED
SYSTEM START
COLLECT
DATASET
DATA CLEANING
SPLITING DATA
PERFORM DATA
BALANCING
DATASET
TESTIN DATA(20%)
TRAINING
DATA(80%)
CLASSIFICATION CLAS S IFIER
TRAINING
(LOGISTICREG,RANDOM
FOREST,DECISSION
TREE,XGBOOST
STROKE:
YES
STROKE:
NO
FIG 3:ARCHITECTURE
22

ER DIAGRAM (User)
FIG 4: ER DIAGRAM FOR USER
23

ER DIAGRAM (Admin)
FIG 5: ER DIAGRAM FOR ADMIN
24

DATA FLOW DIAGRAM
FIG 6: DATA FLOW DIAGRAM
25

USE CASE DIAGRAM (User)
FIG 7: USE CASE DIAGRAM FOR USER
26

USE CASE DIAGRAM (Admin)
FIG 8:USE CASE DIAGRAM FOR ADMIN
27

ALGORITHM/TECHNIQUE USED WITH
COMPLEXITY
Extreme Gradient Boosting Classifier:
 Xgboost is a decision-tree-based ensemble Machine Learning
algorithm that uses a gradient boosting framework.
 In prediction problems involving unstructured data (images, text,
etc.) artificial neural networks tend to outperform all other algorithms or
frameworks. However, when it comes to small to-medium
structured/tabular data, decision tree based b algorithms are considered
best-in-class right now.
28

COMPLEXITY
Random Forest:
 Random Forest is a mainstream ML calculation that has a place with the
administered learning procedure.
 It tends to be utilized for both Arrangement and Relapse issues in ML.
 It depends on the idea of ensemble learning, which is a cycle of
joining different classifiers to take care of a complex problem
29

COMPLEXITY
Logistic Regression:
 Logistic regression is a statistical model that in its basic form uses a
logistic function to model a binary dependent variable, although many more
complex extensions exist.
 In regression analysis, logistic regression (or logit regression) is estimating
the parameters of a logistic model (a form of binary regression).
 linear combination of one or more independent variables ("predictors"); the
independent variables can each be a binary variable (two classes, coded by an
indicator variable) or a continuous variable (any real value).
30

EXPECTED OUTCOMES
Home Login
Above screenshot ,shows Home login .
31

EXPECTED OUTCOMES
Patient Login
Above screenshot ,shows patient login .
32

EXPECTED OUTCOMES
Stroke prediction by giving patient’s data
Above screenshot ,shows prediction form.
33

EXPECTED OUTCOMES
Display whether stroke or not
Above screenshot ,shows prediction result
34

EXPECTED OUTCOMES
Doctor login
Above screenshot ,shows prediction result
35

EXPECTED OUTCOMES
Logistic Regression
Above screenshot ,shows logistic regression.
36

EXPECTED OUTCOMES
Decision Tree
Above screenshot ,shows Decission Tree.
37

EXPECTED OUTCOMES
Random Forest
Above screenshot ,shows Random forest
38

EXPECTED OUTCOMES
XgBoost
Above screenshot ,shows Xgboost.
39

EXPECTED OUTCOMES
Algorithm Comparison
Above screenshot ,shows Algorithm Comparisons .
40

RESULT
algorithm results
.
Algorith
m
F1 score Precision Recall Accuracy
Logistic
Regression
0.81 0.80 0.81 0.81
Decision tree 0.92 0.91 0.94 0.92
Random
forest
0.95 0.93 0.96 0.95
Xgboost 0.96 0.96 0.96 0.96
41
Therefore, in Result this project helps us to predict the patients who are diagnosed with brain
stroke by cleaning the dataset and applying Model to get an accuracy of an average of
96.68%.here highest accuracy is Xgboost

CONCLUSION
 The importance of knowing and understanding the risks of brain stroke is
very much in these trying times.
 The model predicts the probability of brain stroke on the basis of very
trivial day-to-day and known to all parameters.
 This makes this project highly relevant and of need to society. The objective
of implementing the project on a web platform was to reach as many individuals
as possible.
 The early warning can save someone’s life who might have a probability of a
stroke.
42

REFERENCES
• [1] Tasfia Ismail Shoily, Tajul Islam, Sharmin Akter Tanna
"Detection of stroke using machine learning algorithms", 10th International
Conference on Computing, Communication and Networking Technologies
(ICCCNT), IEEE, July 2019.
• [2] JoonNyung Heo , Jihoon G. Yoon , Hyungjong Park , Young
Dae Kim , Hyo Suk Nam and Ji Hoe Heo. "Stroke prediction in acute
stroke", Stroke. 2019;50:1263-1265, AHA Journal, 20 Mar 2019.
• [3] Jaehak Yu,Damee Kim,Hongkyu Park,,Sun-Jin Kim,Sungkyu
Yu,Sejin Park and Seunghee “Semantic analysis of NIH stroke” , 2019
International Conference on Platform Technology and Service (PlatCon),
IEEE, 30 Jan 2019. 43

PUBLICATION
• Rajayogha, S.,& Bruxella, D.J.M.D. (2023, March31).Early prediction
of Brain Stroke using Logistic regression. International journal for
Research Applied science and Engineering Technology,11(3), 1355-
1361.
https://doi.org/10.22214/ijraset.2023.49651
44

final.pptx

More Related Content

What's hot

Similar to final.pptx

Recently uploaded

final.pptx