FINAL YEAR PROJECT
ON
BRAIN STROKE PREDICTION SYSTEM
Project Guide:
Dr. Shankar Thawkar
Hindustan College of Science and Technology, Farah, Mathura
Department of Information Technology
Group Members:
Prateek Chaudhary
Nishant Dixit
Gaurav Raj
Sachin Gautam
Group ID: N/A
1
IT@HCST 2
Presentation Outline
1. Introduction to Project
2. Feasibility study
3. System Architecture
4. Project Design
5. Detail discussion on Dataset
6. Algorithms / Techniques Used
Friday, October 04,2024
3
Brain Stroke
Prediction System
Friday, October 04,2024
4
1. Introduction
Friday, October 04,2024
• Project Title: NeuroRiskX - Optimized and Explainable Stroke Risk Prediction
• Purpose: Provide an accurate stroke risk prediction model that is both interpretable and optimized for clinical
relevance.
• Motivation: Stroke is a major health risk with severe consequences, making early prediction critical for prevention.
• Key Components:
• Machine Learning Model: Provides the predictive backbone for stroke risk.
• Explainable AI (XAI): SHAP allows for interpretability of model predictions.
• Genetic Algorithm (GA): Used to optimize the model’s performance by finding the best hyperparameters.
• Unique Value: Combines accuracy with explainability, fostering clinical trust in AI predictions.
5
2. Feasibility Study
Friday, October 04,2024
•Technical Feasibility:
•Model Selection: RandomForest chosen for its accuracy and compatibility with XAI.
•Tools and Libraries:
•Python: Main language for model development.
•DEAP Library: Facilitates genetic algorithm implementation.
•SHAP: Provides detailed explainability tools for individual predictions.
•Data Availability: Access to dataset, which includes necessary features for stroke prediction.
Economic Feasibility:
•Development Costs: Minimal due to open-source tools.
•Operational Costs: Low, with potential to scale if deployed in a clinical setting.
•Operational Feasibility:
•Clinical Application: Provides easily interpretable results for non-technical stakeholders (e.g., clinicians).
•Deployment Readiness: Code structure (e.g., app.py) indicates a design suitable for integration into healthcare applications.
6
3. System Architecture
Friday, October 04,2024
•Overview:
•Data Layer: Handles data loading and preprocessing. Inputs include the stroke dataset with health metrics and
demographic information.
•Model Layer:
•Initial model (RandomForest) for baseline performance.
•GA for hyperparameter tuning to refine model accuracy and robustness.
•Explainability Layer:
•Uses SHAP to interpret predictions, providing visual insights into feature importance at the individual
prediction level.
•Deployment Layer:
•app.py as the primary deployment interface, possibly as a web application or API for real-time predictions.
•Architecture Diagram: Consider a visual here showing data flow from dataset to model training, GA
optimization, and SHAP explainability.
IT@HCST 7
4. Project Design
Friday, October 04,2024
•Model Design:
•RandomForest Classifier: Known for stability, interpretability, and high accuracy in structured data.
•Training and Testing: Dataset split to evaluate model performance and avoid overfitting.
•Optimization Design:
•Genetic Algorithm (GA):
•Hyperparameters Tuned: n_estimators (number of trees) and max_depth.
•Process: The GA evolves candidate models through selection, crossover, and mutation, maximizing the
cross-validated score.
•Explainability Design:
•SHAP Integration: Applied post-training to visualize feature impact on each prediction.
•Output: Explanations for individual predictions that highlight high-impact features (e.g., age, hypertension).
8
5. Detailed Discussion On Dataset
Friday, October 04,2024
•Dataset Overview:
•Features: Includes demographics and health metrics critical for stroke prediction:
•Demographics: Age, gender, marital status, work type.
•Medical History: Hypertension, heart disease, smoking status.
•Lifestyle Factors: Residence type (urban/rural).
•Target Variable:
•Binary Label: Indicates whether a stroke event occurred.
•Data Preprocessing:
•Missing Values: Handled by imputation techniques.
•Encoding: Categorical variables (e.g., gender) encoded for model compatibility.
•Scaling: Standardization of numerical features to improve model training.
•Insights from Data Exploration:
•Potential correlations observed (e.g., age with stroke risk), informing the choice of features for prediction.
9
6. Techniques/Algorithms Used
Friday, October 04,2024
•RandomForest Classifier:
•Choice Rationale: Known for stability and handling feature importance inherently, suitable for tabular medical
data.
•Baseline Model: Trained to assess initial performance before optimization.
•Explainable AI (XAI) with SHAP:
•Objective: Provide individualized explanations, making predictions transparent for clinicians.
•Methodology: SHAP values offer insights into feature contributions, highlighting the impact of each feature on
individual predictions.
•Example: Show a SHAP force plot to illustrate how SHAP decomposes feature impact.
•Genetic Algorithm (GA) for Hyperparameter Optimization:
•Objective: Improve RandomForest performance by tuning n_estimators and max_depth.
•GA Process:
•Population Initialization: Randomly generate initial candidates.
•Selection: Choose top-performing individuals based on cross-validated model score.
•Crossover & Mutation: Create new candidates and explore the parameter space.
•Outcome: Identify optimal hyperparameters that boost accuracy while preserving interpretability.

Brain stroke prediction system using machine learning

  • 1.
    FINAL YEAR PROJECT ON BRAINSTROKE PREDICTION SYSTEM Project Guide: Dr. Shankar Thawkar Hindustan College of Science and Technology, Farah, Mathura Department of Information Technology Group Members: Prateek Chaudhary Nishant Dixit Gaurav Raj Sachin Gautam Group ID: N/A 1
  • 2.
    IT@HCST 2 Presentation Outline 1.Introduction to Project 2. Feasibility study 3. System Architecture 4. Project Design 5. Detail discussion on Dataset 6. Algorithms / Techniques Used Friday, October 04,2024
  • 3.
  • 4.
    4 1. Introduction Friday, October04,2024 • Project Title: NeuroRiskX - Optimized and Explainable Stroke Risk Prediction • Purpose: Provide an accurate stroke risk prediction model that is both interpretable and optimized for clinical relevance. • Motivation: Stroke is a major health risk with severe consequences, making early prediction critical for prevention. • Key Components: • Machine Learning Model: Provides the predictive backbone for stroke risk. • Explainable AI (XAI): SHAP allows for interpretability of model predictions. • Genetic Algorithm (GA): Used to optimize the model’s performance by finding the best hyperparameters. • Unique Value: Combines accuracy with explainability, fostering clinical trust in AI predictions.
  • 5.
    5 2. Feasibility Study Friday,October 04,2024 •Technical Feasibility: •Model Selection: RandomForest chosen for its accuracy and compatibility with XAI. •Tools and Libraries: •Python: Main language for model development. •DEAP Library: Facilitates genetic algorithm implementation. •SHAP: Provides detailed explainability tools for individual predictions. •Data Availability: Access to dataset, which includes necessary features for stroke prediction. Economic Feasibility: •Development Costs: Minimal due to open-source tools. •Operational Costs: Low, with potential to scale if deployed in a clinical setting. •Operational Feasibility: •Clinical Application: Provides easily interpretable results for non-technical stakeholders (e.g., clinicians). •Deployment Readiness: Code structure (e.g., app.py) indicates a design suitable for integration into healthcare applications.
  • 6.
    6 3. System Architecture Friday,October 04,2024 •Overview: •Data Layer: Handles data loading and preprocessing. Inputs include the stroke dataset with health metrics and demographic information. •Model Layer: •Initial model (RandomForest) for baseline performance. •GA for hyperparameter tuning to refine model accuracy and robustness. •Explainability Layer: •Uses SHAP to interpret predictions, providing visual insights into feature importance at the individual prediction level. •Deployment Layer: •app.py as the primary deployment interface, possibly as a web application or API for real-time predictions. •Architecture Diagram: Consider a visual here showing data flow from dataset to model training, GA optimization, and SHAP explainability.
  • 7.
    IT@HCST 7 4. ProjectDesign Friday, October 04,2024 •Model Design: •RandomForest Classifier: Known for stability, interpretability, and high accuracy in structured data. •Training and Testing: Dataset split to evaluate model performance and avoid overfitting. •Optimization Design: •Genetic Algorithm (GA): •Hyperparameters Tuned: n_estimators (number of trees) and max_depth. •Process: The GA evolves candidate models through selection, crossover, and mutation, maximizing the cross-validated score. •Explainability Design: •SHAP Integration: Applied post-training to visualize feature impact on each prediction. •Output: Explanations for individual predictions that highlight high-impact features (e.g., age, hypertension).
  • 8.
    8 5. Detailed DiscussionOn Dataset Friday, October 04,2024 •Dataset Overview: •Features: Includes demographics and health metrics critical for stroke prediction: •Demographics: Age, gender, marital status, work type. •Medical History: Hypertension, heart disease, smoking status. •Lifestyle Factors: Residence type (urban/rural). •Target Variable: •Binary Label: Indicates whether a stroke event occurred. •Data Preprocessing: •Missing Values: Handled by imputation techniques. •Encoding: Categorical variables (e.g., gender) encoded for model compatibility. •Scaling: Standardization of numerical features to improve model training. •Insights from Data Exploration: •Potential correlations observed (e.g., age with stroke risk), informing the choice of features for prediction.
  • 9.
    9 6. Techniques/Algorithms Used Friday,October 04,2024 •RandomForest Classifier: •Choice Rationale: Known for stability and handling feature importance inherently, suitable for tabular medical data. •Baseline Model: Trained to assess initial performance before optimization. •Explainable AI (XAI) with SHAP: •Objective: Provide individualized explanations, making predictions transparent for clinicians. •Methodology: SHAP values offer insights into feature contributions, highlighting the impact of each feature on individual predictions. •Example: Show a SHAP force plot to illustrate how SHAP decomposes feature impact. •Genetic Algorithm (GA) for Hyperparameter Optimization: •Objective: Improve RandomForest performance by tuning n_estimators and max_depth. •GA Process: •Population Initialization: Randomly generate initial candidates. •Selection: Choose top-performing individuals based on cross-validated model score. •Crossover & Mutation: Create new candidates and explore the parameter space. •Outcome: Identify optimal hyperparameters that boost accuracy while preserving interpretability.