SlideShare a Scribd company logo
Decision Tree
Modelling With
Orange
Identify Rules that Predict
Patient’s Heart Disease
Author: Anthony Mok
Date: 18 Nov 2023
Email: xxiaohao@yahoo.com
Characteristics of Orange
Visual programming makes data
mining accessible to a broader
audience
Provides comprehensive data
preprocessing tools
A vast collection of machine learning
algorithms is available
Excels in interactive data visualisation
Scalable, and integrates with external
software packages
An open-source project with a vibrant
community
Project’s Context, Objective & Strategies
Make Insight-informed Decisions
Clinic collected data on heart
disease diagnosis and other
patient information, and wants to
use the data to make insight-
informed decisions
Predict Patient’s Well-being
To identify the rules that will
predict whether a patient will have
heart disease in the future, based
on the data collected on him/her
Deploy Decision Tree Model
Create a Decision Tree Model, with
rules, to predict whether a patient
will have a heart disease in the future
based on collected data
To train and evaluate the model
Boost the model’s performance
Conduct predictions
Exploratory Data Analysis (EDA)
Findings
Target = Heart Disease
This is a categorical variable,
which has a limited number of
possible values; making it easier
to predict than a continuous
variable, like blood pressure or
cholesterol level
Feature Columns = 9
Row Instances = 918
Blanks & Outliers = None
Decision Tree Workflow in Orange
Loading File, Selecting Columns & Splitting Data
Loading File
Medical.csv file was loaded into workflow with
‘Gender’, ‘FastingBS’ & ‘Exercise’ classified as
‘categorical’ data & given ‘feature’ role, and
‘HeartDisease’ classified as categorical data
&given the ‘target’ role in the ‘File’ Widget
Selecting Columns
In the ‘Select Column’ Widget,
all feature columns were posted
into the ‘Features’ box. The
‘HeartDisease’, which is the ‘target’ was
clicked into the ‘Target’ box in this widget
Splitting Data
Dataset divided into 70% for
training the model while
keeping the remaining
30% for testing the model
Initial Evaluation of Decision Tree Model
Evaluation of Model (30%)
Classification Accuracy for this
model, trained on 30% of the
dataset, is 76.4%
Tree Depth Limited to 10
For initial assessment of the
performance of the Decision Tree
Model, in the Tree Widget, the
maximal tree depth was limited to 10
Evaluation of Model (70%)
Classification Accuracy for this
model, trained on 70% of the
dataset, is 97.1%
Findings
At the Tree Depth of 10, the model
displayed a difference of 15%
when fed with training & testing dataset
Conclusion
Suggests that the Decision
Tree Model Has Been
Overfitted to the training data
Follow-up
To tune the hyperparameters of
the model to enable it to
generalise better to perform well
with the testing data
Tuning the Model to Improve Generalisation
Evaluation of Model (30%)
Classification Accuracy for this
model, trained on 30% of the
dataset, is 80.7%
Tree Depth Now Limited to 3
To tune the model, the maximal tree
depth was adjusted several times.
The depth of 3 was
chosen as Classification
Accuracy scores on training
and testing data are high (about 80%)
while the difference between scores
is negligible (at 1.6%)
Evaluation of Model (70%)
Classification Accuracy for this
model, trained on 70% of the
dataset, is 82.3%
Confusion Table: False Positives/Negatives
Tree Depth at 10 Tree Depth at 3
False Negative = 17.8%
False Positive = 27.4%
Patients may become untreatable when their conditions go untreated (for False Negatives) or may
have to pay for unwanted treatments and bare the consequences of unneedful side-effects from
the treatment (for False Positives). So, reducing the number of False Negatives and False Positives
in the model is beneficial
False Negative = 19.1%
False Positive = 19.4%
While False Negatives have increased by 1.3%, False
Positives have dropped by 8% with the overall model’s
Classification Accuracy improved by 4.3%
Rules Predicting Patient’s Heart Disease*
Sequence of splitting the criteria suggests that Exercise as the top priority
rule with Cholesterol and MaxHR as the two other influencers to
likelihood of Heart Disease in patients
* More details are found in the project report, which are not released at the request of the Clinic
Decision Tree
Modelling With
Orange
Identify Rules that Predict
Patient’s Heart Disease
Author: Anthony Mok
Date: 18 Nov 2023
Email: xxiaohao@yahoo.com

More Related Content

Similar to Identify Rules that Predict Patient’s Heart Disease - An Application of Decision Tree Modelling in Orange

Chapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docxChapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docx
keturahhazelhurst
 
A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).
Waqas Tariq
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
darwinming1
 

Similar to Identify Rules that Predict Patient’s Heart Disease - An Application of Decision Tree Modelling in Orange (20)

Predicting diabetes using a machine learning approach linked in
Predicting diabetes using a machine learning approach   linked inPredicting diabetes using a machine learning approach   linked in
Predicting diabetes using a machine learning approach linked in
 
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docxChapter 19Basic Quantitative Data AnalysisData Cleaning.docx
Chapter 19Basic Quantitative Data AnalysisData Cleaning.docx
 
Biostatistics clinical research & trials
Biostatistics clinical research & trialsBiostatistics clinical research & trials
Biostatistics clinical research & trials
 
Multi Disease Detection using Deep Learning
Multi Disease Detection using Deep LearningMulti Disease Detection using Deep Learning
Multi Disease Detection using Deep Learning
 
Chronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine LearningChronic Kidney Disease Prediction Using Machine Learning
Chronic Kidney Disease Prediction Using Machine Learning
 
A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).A Framework for Statistical Simulation of Physiological Responses (SSPR).
A Framework for Statistical Simulation of Physiological Responses (SSPR).
 
Enhanced Detection System for Trust Aware P2P Communication Networks
Enhanced Detection System for Trust Aware P2P Communication NetworksEnhanced Detection System for Trust Aware P2P Communication Networks
Enhanced Detection System for Trust Aware P2P Communication Networks
 
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
Comparative Study of Diabetic Patient Data’s Using Classification Algorithm i...
 
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lg...
 
Dissertation
DissertationDissertation
Dissertation
 
Predicting Heart Disease Using Machine Learning Algorithms.
Predicting Heart Disease Using Machine Learning Algorithms.Predicting Heart Disease Using Machine Learning Algorithms.
Predicting Heart Disease Using Machine Learning Algorithms.
 
Performance evaluation of random forest with feature selection methods in pre...
Performance evaluation of random forest with feature selection methods in pre...Performance evaluation of random forest with feature selection methods in pre...
Performance evaluation of random forest with feature selection methods in pre...
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
 
KG_based pharma marketing.pptx
KG_based pharma marketing.pptxKG_based pharma marketing.pptx
KG_based pharma marketing.pptx
 
Statistics in meta analysis
Statistics in meta analysisStatistics in meta analysis
Statistics in meta analysis
 
26738157 sampling-design
26738157 sampling-design26738157 sampling-design
26738157 sampling-design
 
Data science
Data scienceData science
Data science
 
演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪演講-Meta analysis in medical research-張偉豪
演講-Meta analysis in medical research-張偉豪
 
HEALTH PREDICTION ANALYSIS USING DATA MINING
HEALTH PREDICTION ANALYSIS USING DATA  MININGHEALTH PREDICTION ANALYSIS USING DATA  MINING
HEALTH PREDICTION ANALYSIS USING DATA MINING
 
A Heart Disease Prediction Model using Decision Tree
A Heart Disease Prediction Model using Decision TreeA Heart Disease Prediction Model using Decision Tree
A Heart Disease Prediction Model using Decision Tree
 

More from ThinkInnovation

Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsDecision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
ThinkInnovation
 
Breakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsBreakfast Talk - Manage Projects
Breakfast Talk - Manage Projects
ThinkInnovation
 
Think innovation issue 4 share - scamper
Think innovation issue 4   share - scamperThink innovation issue 4   share - scamper
Think innovation issue 4 share - scamper
ThinkInnovation
 

More from ThinkInnovation (19)

Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
Ordinary Least Square Regression & Stage-2 Regression - Factors Influencing M...
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
Decision Making Under Uncertainty - Predict the Chances of a Person Suffering...
 
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
Decision Making Under Uncertainty - Is It Better Off Joining a Partnership or...
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take PrecautionsDecision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
Decision Making Under Uncertainty - Decide Whether Or Not to Take Precautions
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
Create Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI DesktopCreate Data Model & Conduct Visualisation in Power BI Desktop
Create Data Model & Conduct Visualisation in Power BI Desktop
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data Warehouse
 
Creating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power PivotCreating Data Warehouse Using Power Query & Power Pivot
Creating Data Warehouse Using Power Query & Power Pivot
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...
 
Breakfast Talk - Manage Projects
Breakfast Talk - Manage ProjectsBreakfast Talk - Manage Projects
Breakfast Talk - Manage Projects
 
Think innovation issue 4 share - scamper
Think innovation issue 4   share - scamperThink innovation issue 4   share - scamper
Think innovation issue 4 share - scamper
 
SCAMPER
SCAMPERSCAMPER
SCAMPER
 
Reverse Assumption Method
Reverse Assumption MethodReverse Assumption Method
Reverse Assumption Method
 
Psyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating ConversationsPsyche of Facilitation - The New Language of Facilitating Conversations
Psyche of Facilitation - The New Language of Facilitating Conversations
 
Visual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word AssociationVisual Connection - Ideation Through Word Association
Visual Connection - Ideation Through Word Association
 

Recently uploaded

Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
StarCompliance.io
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
benishzehra469
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 

Recently uploaded (20)

Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Machine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptxMachine Learning For Career Growth..pptx
Machine Learning For Career Growth..pptx
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 

Identify Rules that Predict Patient’s Heart Disease - An Application of Decision Tree Modelling in Orange

  • 1. Decision Tree Modelling With Orange Identify Rules that Predict Patient’s Heart Disease Author: Anthony Mok Date: 18 Nov 2023 Email: xxiaohao@yahoo.com
  • 2. Characteristics of Orange Visual programming makes data mining accessible to a broader audience Provides comprehensive data preprocessing tools A vast collection of machine learning algorithms is available Excels in interactive data visualisation Scalable, and integrates with external software packages An open-source project with a vibrant community
  • 3. Project’s Context, Objective & Strategies Make Insight-informed Decisions Clinic collected data on heart disease diagnosis and other patient information, and wants to use the data to make insight- informed decisions Predict Patient’s Well-being To identify the rules that will predict whether a patient will have heart disease in the future, based on the data collected on him/her Deploy Decision Tree Model Create a Decision Tree Model, with rules, to predict whether a patient will have a heart disease in the future based on collected data To train and evaluate the model Boost the model’s performance Conduct predictions
  • 4. Exploratory Data Analysis (EDA) Findings Target = Heart Disease This is a categorical variable, which has a limited number of possible values; making it easier to predict than a continuous variable, like blood pressure or cholesterol level Feature Columns = 9 Row Instances = 918 Blanks & Outliers = None
  • 6. Loading File, Selecting Columns & Splitting Data Loading File Medical.csv file was loaded into workflow with ‘Gender’, ‘FastingBS’ & ‘Exercise’ classified as ‘categorical’ data & given ‘feature’ role, and ‘HeartDisease’ classified as categorical data &given the ‘target’ role in the ‘File’ Widget Selecting Columns In the ‘Select Column’ Widget, all feature columns were posted into the ‘Features’ box. The ‘HeartDisease’, which is the ‘target’ was clicked into the ‘Target’ box in this widget Splitting Data Dataset divided into 70% for training the model while keeping the remaining 30% for testing the model
  • 7. Initial Evaluation of Decision Tree Model Evaluation of Model (30%) Classification Accuracy for this model, trained on 30% of the dataset, is 76.4% Tree Depth Limited to 10 For initial assessment of the performance of the Decision Tree Model, in the Tree Widget, the maximal tree depth was limited to 10 Evaluation of Model (70%) Classification Accuracy for this model, trained on 70% of the dataset, is 97.1% Findings At the Tree Depth of 10, the model displayed a difference of 15% when fed with training & testing dataset Conclusion Suggests that the Decision Tree Model Has Been Overfitted to the training data Follow-up To tune the hyperparameters of the model to enable it to generalise better to perform well with the testing data
  • 8. Tuning the Model to Improve Generalisation Evaluation of Model (30%) Classification Accuracy for this model, trained on 30% of the dataset, is 80.7% Tree Depth Now Limited to 3 To tune the model, the maximal tree depth was adjusted several times. The depth of 3 was chosen as Classification Accuracy scores on training and testing data are high (about 80%) while the difference between scores is negligible (at 1.6%) Evaluation of Model (70%) Classification Accuracy for this model, trained on 70% of the dataset, is 82.3%
  • 9. Confusion Table: False Positives/Negatives Tree Depth at 10 Tree Depth at 3 False Negative = 17.8% False Positive = 27.4% Patients may become untreatable when their conditions go untreated (for False Negatives) or may have to pay for unwanted treatments and bare the consequences of unneedful side-effects from the treatment (for False Positives). So, reducing the number of False Negatives and False Positives in the model is beneficial False Negative = 19.1% False Positive = 19.4% While False Negatives have increased by 1.3%, False Positives have dropped by 8% with the overall model’s Classification Accuracy improved by 4.3%
  • 10. Rules Predicting Patient’s Heart Disease* Sequence of splitting the criteria suggests that Exercise as the top priority rule with Cholesterol and MaxHR as the two other influencers to likelihood of Heart Disease in patients * More details are found in the project report, which are not released at the request of the Clinic
  • 11. Decision Tree Modelling With Orange Identify Rules that Predict Patient’s Heart Disease Author: Anthony Mok Date: 18 Nov 2023 Email: xxiaohao@yahoo.com