2023 Supervised Learning for Orange3 from scratch

Tutorial
Content
3
Decision Tree
Build the supervised learning models
Home work
What is the supervised learning

Code
• Download code
• https://drive.google.com/drive/folders/19fTtqp-nyASeL-
Qkpr7yjbt18YgfKsyj?usp=sharing
4

Reviewing the previous assignment
5

The 80% of work are in data pre-processing!
• It requires the data engineering work.
• Launch the Jupyter Lab and open the notebook.
6
Transpose_matrix.ipynb

Previous Homework (workflow)
7
03.ows

Previous Homework (Find the rules!)
8

What is the supervised learning
9

Supervised learning
• Supervised learning: discover patterns in the data that relate data
attributes with a target (called labeled) attribute.
• These patterns are utilized to predict the values of the target attribute in
future data instances.
• Classic supervised learning algorithm
• Classification => Label Prediction (Yes/ No binary question)
• Regression => Measures Prediction (Continuous numeric value)
10

Supervised learning
12
Ref: https://www.tibco.com/reference-center/what-is-supervised-learning

Abou the decision tree model
13

Decision Tree Model
• Decision tree is a method of prediction that uses conditions
with YES/NO answers, called classification.
• Due to its similarity to human thought processes, the results
obtained from this method are easy to understand.
• The parts involving conditions are called nodes or internal
nodes. The topmost node is called root node.
• The end nodes representing the classification in a decision
tree are called leaf nodes, representing categories.
14

15
Leaf node
Internal node
Root node
Split

Tree model is a kind of them
16

About the tree based model
• In general, you can make classification
predictions or numerical predictions.
• An advanced tree based model is such
like the Random Forest model, which
combines multiple CART models.
• The idea is to ensemble several weak
solvers to construct a stronger model.
17
Leo Breiman introduced CART,
Random Forest, and Bagging
algorithms.
https://en.wikipedia.org/wiki/Leo_Breiman

• To classify based on the internal conditions of feature values.
18
Temp < 15
Temp > 25
Humid < 40%
Humid > 60%
Un cozy
Un cozy
Un cozy
Un cozy Cozy
No
Yes
No
Yes
No
Yes
No
Yes
The importance features:
Temp > Humid

• Classification
19
Humid %
Temp
Cozy
Non
Cozy
60
40
15 25

https://sharkyun.medium.com/decision-tree-%E6%B1%BA%E7%AD%96%E6%A8%B9-41597818c075
9

• Each block has LIKE( ) and DISLIKE ( ) samples
• The probability of each block:
• The first block: 5/6
• The second block: 8/12
• The third block: 3/10
• The forth block: 1/4
21
Quiz:
1. How to select the most LIKE conditions?

• A regression solver, the leaf
nodes contain numeric values.
• Mean
• Mode
• Median
22

23
Quiz:
1. Can you explain the importance of features?
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/datasets/descr/diabetes.rst

• Fast computation speed.
• Minimal data engineering required, no need for data normalization,
dummy variables, one-hot encoding, etc.
• Only CART can handle both continuous and categorical features
simultaneously.
• Prone to the impact of sample imbalance (dependent variable), needs pre-
processing in advance .
• Over-sampling (Synthetic Minority Oversampling Technique, SMOTE)
• Under-sampling
• High interpretability, suitable for visual analysis, and easy to extract rules.
• The model results serve as conditions, and can be directly retrieved later
using SQL syntax.
24

The 3 types of Tree based models
• ID3 algorithm
• Choosing the highest information gain value to split the nodes
• C4.5 algorithm
• Choosing the highest information gain ratio value to split the
nodes
• CART algorithm (Classification and Regression Tree)
• Choosing the lowest of GINI impurity value to split the nodes
25

The 3 types of Tree based models
26
https://github.com/richzw/MachineLearningTips/blob/master/DecisionTree.md

Node split criteria
• Discussing the criteria for splitting nodes is mainly to explore
the feature importance.
• Why do we need the importance of features?
27

Node split criteria
• In 1948, Shannon published the mathematical principles of
communication, laying the foundation for modern information theory.
He introduced the concept of information entropy, which solved the
problem of quantifying information.
• When the uncertainty of a problem is greater, it requires more
information to understand the problem, indicating a higher
information entropy.
28
P(x) the probability of event x occurring Claude Shannon

• A box contains 5 white balls and 5 red balls. If you randomly pick one
ball, what is the color of that ball? How much information does this
question carry?
• The probability of getting a white ball or a red ball is both 1/2. When plugged
into the information entropy formula:
Node split criteria
29

Node split criteria
• When building a decision tree, how do you prioritize which feature to
choose for splitting?
• Examine all features, calculate the change in information entropy before and
after splitting the dataset for each feature.
• Finally, choose the feature that results in the largest change in information
entropy as the primary basis for splitting nodes.
30

Node split criteria
Ball Color Red White Black Blue
Quality 1 1 1 2
Quantity 1 2 1 2
Number of balls 2 2 4 8
Probability 2/16 2/16 4/16 8/16
31
• Should the priority feature difference be Quality or Quantity?

Node split criteria
32
16 balls
Quantity
=1
White*2
Blue*8
Red*2
Black*4
Entropy = 1.75
Entropy =0.721928
Entropy =0.918296
16 balls
Quality
=1
Blue*8
Blue*2
Black*4
White*2
Entropy = 1.75
Entropy =0
Entropy =1.5
No
Yes
No
Yes

Quiz: How to select feature to split the node?
33

Node split criteria
• Using information gain as the criteria for feature splitting, it
tends to prioritize features with the most feature values
• For example: using personal ID as a feature might result in each leaf
node having only one sample.
• Solution:
• Merge the feature values based on business experience to reduce
the number of feature values.
• Use other algorithm (such like information gain ration, C4.5)
34

Node split criteria
• Tree Orange 3 (Hyper-parameters)
• https://orange3.readthedocs.io/projects/orange-visual-
programming/en/latest/widgets/model/tree.html
35

Node split criteria
• Prioritize the hyper-parameter
of [limiting the maximum tree
depth]
36

About the Bias Variance
• Such like the marksmanship training
• Hit the target
• Low bias/ variance
• Concentrated but not accurate
• Low variance/ high bias
38

Lower the total error
39
When the Overfitting, it always comes
with the high variance

What is the Overfitting ?
40
Overfitting reason?
1. Dataset has noise
2. Too complex hyper-parameters
3. Early Stopping
4. Dataset is too small
5. Feature reduction
6. Normalization
7. Adjust hyper-parameters
8. Change other algorithm

What is the Underfitting ?
• Model is too simply
• Increate the iteration to convergency
• Adjust the hyper-parameters
• Add more features to dataset
• Change to another complicated model
41

Supervised learning evaluation
• Confusion matrix (for classification)
42

Supervised learning evaluation
43
Ture Positive Rate (TPR)
False Positive Rate (FPR)

Tree based models and prediction
44

Tree based model (It uses ID3 algorithm)
• Orange 3 has itself re-defined algorithm for tree based model.
• You must check feature values of your dataset carefully!
• Why?
• In Orange 3, you can use both types of category or numeric inputs.
45
04.ows

Tree based model
46
Rank: Feature selection
Data Sampler: Splitting dataset
ROC Analysis: Model Evaluation
Confusion Matrix: Model Evalution
The AUC of testing dataset should below the training dataset

Tree based model
47
Select case
when petal.length <= 1.9 then Iris-Sentosa
when petal.length > 1.9 and petal.width > 1.7 then Iris-virginca
when petal.length >1.9 and petal.width <= 1.7 and petal.length <= 4.9 then Iris-versicolor
when petal.length >1.9 and petal.width <= 1.7 and petal.length > 4.9 and petal.width <= 1.5 then Iris-virginica
when petal.length >1.9 and petal.width <= 1.7 and petal.length > 4.9 and petal.width > 1.5 then Iris-versicolor
From Iris;

Tree based model
• Quiz: Can you observe the feature importance of tree layout?
• Yes, the petal.length is the most significant feature!
48
Feature selection

Random Forest algorithm
• The Random Forest algorithm widespread popularity stems from its
user-friendly nature and adaptability, enabling it to tackle both
classification and regression problems effectively.
• It lies in its ability to handle complex datasets and mitigate overfitting,
making it a valuable tool for various predictive tasks in machine
learning.
49

Bagging and bootstrap
• Randomly select multiple subsets
of dataset with rows and features
and build multiple CART models.
• Each CART model must have an
accuracy of over 50%.
• Finally, combine these CART
model outputs to produce the
ensemble prediction.
• Voting
50
https://gaussian37.github.io/ml-concept-bagging/

Random Forest algorithm
51
In general, Random Forest tends to perform better.

52
It supports classifier or regressor.
Stacking all
prediction
outputs from
testing inputs
Add testing Y to
stacking dataset,
and build a meta
model
Using training set to build a based model
Get the stacking
results
It normally works well on complex data sets.

Stack
• Download Bank Marketing dataset
53

Stacking
54
This workflow only built from training dataset, no testing dataset!
Quiz: Do you know how to split the dataset and rebuild it again?

Stacking
55
• In overall, the Stack perform a better results!

Homework
• Using healthcare_dataset.csv to build a classification model and predict
the Test Results.
• You should follow the steps such like:
• Exploration Data Analysis/ Pre-processing (missing value, data cleaning…)
• Featuring selection
• Build multiple models and try to get the best accuracy one.
• Submit your training/testing model evaluation
• Copy those steps above you used in the PPT and lucid present with
texts or illustrations to your observations.
56

2023 Supervised Learning for Orange3 from scratch

Recommended

Recommended

More Related Content

Similar to 2023 Supervised Learning for Orange3 from scratch

Similar to 2023 Supervised Learning for Orange3 from scratch (20)

More from FEG

More from FEG (20)

Recently uploaded

Recently uploaded (20)

2023 Supervised Learning for Orange3 from scratch