Brain stroke prediction system using machine learning

FINAL YEAR PROJECT
ON
BRAIN STROKE PREDICTION SYSTEM
Project Guide:
Dr. Shankar Thawkar
Hindustan College of Science and Technology, Farah, Mathura
Department of Information Technology
Group Members:
Prateek Chaudhary
Nishant Dixit
Gaurav Raj
Sachin Gautam
Group ID: N/A
1

IT@HCST 2
Presentation Outline
1. Introduction to Project
2. Feasibility study
3. System Architecture
4. Project Design
5. Detail discussion on Dataset
6. Algorithms / Techniques Used
Friday, October 04,2024

3
Brain Stroke
Prediction System

4
1. Introduction
• Project Title: NeuroRiskX - Optimized and Explainable Stroke Risk Prediction
• Purpose: Provide an accurate stroke risk prediction model that is both interpretable and optimized for clinical
relevance.
• Motivation: Stroke is a major health risk with severe consequences, making early prediction critical for prevention.
• Key Components:
• Machine Learning Model: Provides the predictive backbone for stroke risk.
• Explainable AI (XAI): SHAP allows for interpretability of model predictions.
• Genetic Algorithm (GA): Used to optimize the model’s performance by finding the best hyperparameters.
• Unique Value: Combines accuracy with explainability, fostering clinical trust in AI predictions.

5
2. Feasibility Study
•Technical Feasibility:
•Model Selection: RandomForest chosen for its accuracy and compatibility with XAI.
•Tools and Libraries:
•Python: Main language for model development.
•DEAP Library: Facilitates genetic algorithm implementation.
•SHAP: Provides detailed explainability tools for individual predictions.
•Data Availability: Access to dataset, which includes necessary features for stroke prediction.
Economic Feasibility:
•Development Costs: Minimal due to open-source tools.
•Operational Costs: Low, with potential to scale if deployed in a clinical setting.
•Operational Feasibility:
•Clinical Application: Provides easily interpretable results for non-technical stakeholders (e.g., clinicians).
•Deployment Readiness: Code structure (e.g., app.py) indicates a design suitable for integration into healthcare applications.

6
3. System Architecture
•Overview:
•Data Layer: Handles data loading and preprocessing. Inputs include the stroke dataset with health metrics and
demographic information.
•Model Layer:
•Initial model (RandomForest) for baseline performance.
•GA for hyperparameter tuning to refine model accuracy and robustness.
•Explainability Layer:
•Uses SHAP to interpret predictions, providing visual insights into feature importance at the individual
prediction level.
•Deployment Layer:
•app.py as the primary deployment interface, possibly as a web application or API for real-time predictions.
•Architecture Diagram: Consider a visual here showing data flow from dataset to model training, GA
optimization, and SHAP explainability.

IT@HCST 7
4. Project Design
•Model Design:
•RandomForest Classifier: Known for stability, interpretability, and high accuracy in structured data.
•Training and Testing: Dataset split to evaluate model performance and avoid overfitting.
•Optimization Design:
•Genetic Algorithm (GA):
•Hyperparameters Tuned: n_estimators (number of trees) and max_depth.
•Process: The GA evolves candidate models through selection, crossover, and mutation, maximizing the
cross-validated score.
•Explainability Design:
•SHAP Integration: Applied post-training to visualize feature impact on each prediction.
•Output: Explanations for individual predictions that highlight high-impact features (e.g., age, hypertension).

8
5. Detailed Discussion On Dataset
•Dataset Overview:
•Features: Includes demographics and health metrics critical for stroke prediction:
•Demographics: Age, gender, marital status, work type.
•Medical History: Hypertension, heart disease, smoking status.
•Lifestyle Factors: Residence type (urban/rural).
•Target Variable:
•Binary Label: Indicates whether a stroke event occurred.
•Data Preprocessing:
•Missing Values: Handled by imputation techniques.
•Encoding: Categorical variables (e.g., gender) encoded for model compatibility.
•Scaling: Standardization of numerical features to improve model training.
•Insights from Data Exploration:
•Potential correlations observed (e.g., age with stroke risk), informing the choice of features for prediction.

9
6. Techniques/Algorithms Used
•RandomForest Classifier:
•Choice Rationale: Known for stability and handling feature importance inherently, suitable for tabular medical
data.
•Baseline Model: Trained to assess initial performance before optimization.
•Explainable AI (XAI) with SHAP:
•Objective: Provide individualized explanations, making predictions transparent for clinicians.
•Methodology: SHAP values offer insights into feature contributions, highlighting the impact of each feature on
individual predictions.
•Example: Show a SHAP force plot to illustrate how SHAP decomposes feature impact.
•Genetic Algorithm (GA) for Hyperparameter Optimization:
•Objective: Improve RandomForest performance by tuning n_estimators and max_depth.
•GA Process:
•Population Initialization: Randomly generate initial candidates.
•Selection: Choose top-performing individuals based on cross-validated model score.
•Crossover & Mutation: Create new candidates and explore the parameter space.
•Outcome: Identify optimal hyperparameters that boost accuracy while preserving interpretability.

Brain stroke prediction system using machine learning

More Related Content

Similar to Brain stroke prediction system using machine learning

Recently uploaded

Brain stroke prediction system using machine learning