AGENDA
1. Introduction
2. Literature Review
3. Research Objective
4. Proposed Methodology
5. Result Analysis
6. Conclusion
7. References
INTRODUCTION
 Cardiovascular diseases (CVDs) are the leading global cause of death,
responsible for 17.9 million deaths annually [1].
 Low and middle-income countries face critical challenges due to limited
healthcare resources.
 Advanced diagnostic tools (e.g., angiography, ECG) are often inaccessible due to
high costs and need for specialized professionals [2].
 Traditional prediction models focus only on physical factors (e.g., cholesterol,
blood pressure, age) [3].
 Mental health factors like stress, sleep duration, and working hours, which impact
cardiovascular health, are often overlooked [2].
INTRODUCTION
 Machine learning provides a cost-effective approach to predicting heart disease
risks [4].
 Study integrates psychological factors like stress, work hours, and sleep into
predictive models.
 Cleveland Heart Disease dataset enriched with synthetic mental health features for
analysis.
 Personalized models based on age (<40 vs. ≥40) and gender were explored.
LITERATURE REVIEW
• Mohan et al. [1], Highlighted the potential of hybrid machine learning models,
combining physical and psychological parameters, to improve predictive accuracy in
heart disease.
• Subahi et al. [5] Proposed the integration of stress levels, sleep patterns, and lifestyle
factors into datasets to enhance the prediction of heart disease.
• Yaqoob et al. [6], Identified fasting blood sugar as a critical predictor of heart disease,
but pointed out the limitations of models that exclude multi-dimensional health
factors, particularly psychological ones.
• Soni et al. [7] Demonstrated that advanced ML algorithms like Random Forest and
XGBoost outperform traditional models in predictive accuracy, effectively capturing
non-linear relationships between variables, but face challenges in model
explainability for clinical acceptance.
RESEARCH OBJECTIVE
 Integrates mental health indicators into heart disease prediction models using
the Cleveland Heart Disease dataset.
 Develops personalized machine learning models considering both physical and
mental health factors.
 Tailors models for specific age groups (under 40 and 40 or older) and gender to
evaluate personalized vs. generalized predictions.
 Employs multiple ML classifiers: Random Forest, XGBoost, KNN, SVM,
Logistic Regression, Decision Tree, and Naive Bayes.
PROPOSED METHODOLOGY
• Dataset Details: Cleveland Heart Disease Dataset (303 records, UC Irvine ML
Repository) [8].
• Attributes: Age, Sex, Cholesterol, Blood Pressure, Exercise-Induced Angina, etc.
• Synthetic Features Added: Stress Level (1–10), Sleep Duration (4–9 hours/day),
Work Hours (20–70 hours/week).
• Age Group: Young (<40 years) and Old (≥40 years)
• Data Pre-processing: Converted categorical variables (e.g., Sex, Chest Pain Type)
into numeric.
• Handled missing values via imputation or exclusion.
• Normalized data for scale-sensitive models (e.g., XGBoost, KNN).
• Feature Engineering: Key features identified using Random Forest importance.
• Experimented with top 2 and top 4 features for better performance.
PROPOSED METHODOLOGY
• Model Development:
• ML Models: Decision Tree, Random Forest, KNN, SVM, Logistic Data split
into 70% training and 30% testing for fair evaluation.
• Age Group-Based Analysis:
• Separate evaluation for Young (<40) and Old (≥40) groups.
• Analyzed performance patterns and age-related feature impact
MODELS USED FOR HEART DISEASE PREDICTION:
Decision Tree:
Splits data into branches based on feature values.
Advantages: Easy to interpret and visualize, straightforward
decision-making.
Disadvantages: Prone to overfitting and high variance without pruning.
Random Forest:
Ensemble model that averages multiple decision trees for predictions.
Advantages: Reduces overfitting, robust with noisy data, better generalization.
Disadvantages: Slower predictions and increased complexity with more trees.
K-Nearest Neighbors (KNN):
Classifies data points based on the majority class of their nearest neighbors.
Advantages: Simple and effective for small datasets, intuitive approach.
Disadvantages: Computationally intensive for large datasets, sensitive to scaling.
MODELS USED FOR HEART DISEASE PREDICTION:
Support Vector Machine (SVM):
Finds the best hyperplane to separate data classes in high-dimensional space.
Advantages: Effective for high-dimensional data, robust with a clear margin of
separation.
Disadvantages: High computational cost, slow training for large datasets.
Logistic Regression:
Predicts binary outcomes based on a logistic function.
Advantages: Easy to implement and interpret, well-suited for binary classification.
Disadvantages: Assumes linearity between features and log odds, which may not hold
for complex relationships.
XGBoost :
Optimized gradient boosting framework that builds sequential trees to correct errors.
Advantages: High accuracy, reduces overfitting with regularization.
Disadvantages: Computationally expensive, requires significant training time.
R E S U L T A N A L Y S I S
CONCLUSION
• The study integrates mental health indicators (stress, sleep, working hours) with
traditional cardiovascular risk factors to predict heart disease, enhancing model
accuracy across different demographic groups..
• ).
• Future research will focus on multi-modal models, wearable device integration for
real-time monitoring, and mobile applications to improve early diagnosis and
healthcare access, particularly in low-resource setting
REFERENCES
[1 [7] P. Soni and R. Sharma, "Deep Learning Models for Heart Disease
Prediction: A Review," Computers in Biology and Medicine, vol. 147, p.
105436, 2022.
[8] Psychogyios, K., Ilias, L., Askounis, D.: Comparison of missing data
imputation methods using the Framingham heart study dataset. In: 2022
IEEE-EMBS International Conference on Biomedical and Health
Informatics (BHI), pp. 1–5 (2022)

confrence_ppt (1)confrence_ppt (1confrence_ppt (1.pptx

  • 1.
    AGENDA 1. Introduction 2. LiteratureReview 3. Research Objective 4. Proposed Methodology 5. Result Analysis 6. Conclusion 7. References
  • 2.
    INTRODUCTION  Cardiovascular diseases(CVDs) are the leading global cause of death, responsible for 17.9 million deaths annually [1].  Low and middle-income countries face critical challenges due to limited healthcare resources.  Advanced diagnostic tools (e.g., angiography, ECG) are often inaccessible due to high costs and need for specialized professionals [2].  Traditional prediction models focus only on physical factors (e.g., cholesterol, blood pressure, age) [3].  Mental health factors like stress, sleep duration, and working hours, which impact cardiovascular health, are often overlooked [2].
  • 3.
    INTRODUCTION  Machine learningprovides a cost-effective approach to predicting heart disease risks [4].  Study integrates psychological factors like stress, work hours, and sleep into predictive models.  Cleveland Heart Disease dataset enriched with synthetic mental health features for analysis.  Personalized models based on age (<40 vs. ≥40) and gender were explored.
  • 4.
    LITERATURE REVIEW • Mohanet al. [1], Highlighted the potential of hybrid machine learning models, combining physical and psychological parameters, to improve predictive accuracy in heart disease. • Subahi et al. [5] Proposed the integration of stress levels, sleep patterns, and lifestyle factors into datasets to enhance the prediction of heart disease. • Yaqoob et al. [6], Identified fasting blood sugar as a critical predictor of heart disease, but pointed out the limitations of models that exclude multi-dimensional health factors, particularly psychological ones. • Soni et al. [7] Demonstrated that advanced ML algorithms like Random Forest and XGBoost outperform traditional models in predictive accuracy, effectively capturing non-linear relationships between variables, but face challenges in model explainability for clinical acceptance.
  • 5.
    RESEARCH OBJECTIVE  Integratesmental health indicators into heart disease prediction models using the Cleveland Heart Disease dataset.  Develops personalized machine learning models considering both physical and mental health factors.  Tailors models for specific age groups (under 40 and 40 or older) and gender to evaluate personalized vs. generalized predictions.  Employs multiple ML classifiers: Random Forest, XGBoost, KNN, SVM, Logistic Regression, Decision Tree, and Naive Bayes.
  • 6.
    PROPOSED METHODOLOGY • DatasetDetails: Cleveland Heart Disease Dataset (303 records, UC Irvine ML Repository) [8]. • Attributes: Age, Sex, Cholesterol, Blood Pressure, Exercise-Induced Angina, etc. • Synthetic Features Added: Stress Level (1–10), Sleep Duration (4–9 hours/day), Work Hours (20–70 hours/week). • Age Group: Young (<40 years) and Old (≥40 years) • Data Pre-processing: Converted categorical variables (e.g., Sex, Chest Pain Type) into numeric. • Handled missing values via imputation or exclusion. • Normalized data for scale-sensitive models (e.g., XGBoost, KNN). • Feature Engineering: Key features identified using Random Forest importance. • Experimented with top 2 and top 4 features for better performance.
  • 7.
    PROPOSED METHODOLOGY • ModelDevelopment: • ML Models: Decision Tree, Random Forest, KNN, SVM, Logistic Data split into 70% training and 30% testing for fair evaluation. • Age Group-Based Analysis: • Separate evaluation for Young (<40) and Old (≥40) groups. • Analyzed performance patterns and age-related feature impact
  • 8.
    MODELS USED FORHEART DISEASE PREDICTION: Decision Tree: Splits data into branches based on feature values. Advantages: Easy to interpret and visualize, straightforward decision-making. Disadvantages: Prone to overfitting and high variance without pruning. Random Forest: Ensemble model that averages multiple decision trees for predictions. Advantages: Reduces overfitting, robust with noisy data, better generalization. Disadvantages: Slower predictions and increased complexity with more trees. K-Nearest Neighbors (KNN): Classifies data points based on the majority class of their nearest neighbors. Advantages: Simple and effective for small datasets, intuitive approach. Disadvantages: Computationally intensive for large datasets, sensitive to scaling.
  • 9.
    MODELS USED FORHEART DISEASE PREDICTION: Support Vector Machine (SVM): Finds the best hyperplane to separate data classes in high-dimensional space. Advantages: Effective for high-dimensional data, robust with a clear margin of separation. Disadvantages: High computational cost, slow training for large datasets. Logistic Regression: Predicts binary outcomes based on a logistic function. Advantages: Easy to implement and interpret, well-suited for binary classification. Disadvantages: Assumes linearity between features and log odds, which may not hold for complex relationships. XGBoost : Optimized gradient boosting framework that builds sequential trees to correct errors. Advantages: High accuracy, reduces overfitting with regularization. Disadvantages: Computationally expensive, requires significant training time.
  • 10.
    R E SU L T A N A L Y S I S
  • 11.
    CONCLUSION • The studyintegrates mental health indicators (stress, sleep, working hours) with traditional cardiovascular risk factors to predict heart disease, enhancing model accuracy across different demographic groups.. • ). • Future research will focus on multi-modal models, wearable device integration for real-time monitoring, and mobile applications to improve early diagnosis and healthcare access, particularly in low-resource setting
  • 12.
    REFERENCES [1 [7] P.Soni and R. Sharma, "Deep Learning Models for Heart Disease Prediction: A Review," Computers in Biology and Medicine, vol. 147, p. 105436, 2022. [8] Psychogyios, K., Ilias, L., Askounis, D.: Comparison of missing data imputation methods using the Framingham heart study dataset. In: 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 1–5 (2022)