SlideShare a Scribd company logo
1 of 97
DECISION TREE
Submitted to:
Gyanaranjan Shial
Assistant Professor
Veer Surendra Sai University of Technology, Burla
SUBMITTED BY: -
SL NO. NAME REGISTRATION NO PAGE NO
1 SASWATI DAS CHOUDHURY 2002030149 4 - 15
2 ASHUTOSH MISHRA 2002030011 16 - 25
3 SATYABRATA DWIVEDY 2004050002 26 - 49
4 SMRITI PANDA 2004050024 50 - 65
5 SWARAJ PRADHAN 2002050082 66 - 82
6 BISHNU PRASAD SAHOO 2002050074 83 - 90
7 SUBHAM SAURAV PANDA 2002050124 91 - 96
Content
● Machine Learning
● Decision Tree Overview
● Examples, Splitting Criteria and Process
● Feature Selection and extraction, real world problems
● Training and Testing data set
● Advantages and disadvantages of Decision Tree
● Building Decision Tree
● Decision Tree Algorithms
● Missing Data, Effective decision tree
● Conclusion
Name-Saswati Das Choudhury
Registration number-2002030149
TOPICS:Machine Learning,Supervised Learning ,Decision Tree
Overview and its code in Python
Machine Learning
● Machine learning(ML) investigates how computers can learn based on data.ML
approaches have been applied to large language models, computer vision, speech
recognition, email filtering, agriculture and medicine.
● The term machine learning was coined in 1959 by Arthur Samuel. The synonym self-
teaching computers was also used in this time period.
● Machine learning and data mining often employ the same methods and overlap
significantly, but while ML focuses on prediction, based on known properties learned
from the training data, data mining focuses on the discovery of (previously) unknown
properties in the data.
● Machine learning also employs data mining methods as "unsupervised learning" or
as a preprocessing step to improve learner accuracy.
● Modern-day machine learning has two objectives, one is to classify data based on
models which have been developed, the other purpose is to make predictions for
future outcomes based on these models.
● The mathematical foundations of ML are provided by mathematical optimization
(mathematical programming) methods.
● Machine learning approaches are traditionally divided into three broad categories
supervised learning, unsupervised learning, reinforcement learning.
Supervised Learning
● Supervised learning algorithms build a mathematical model of a set of data that
contains both the inputs and the desired outputs.The data is known as training data,
and consists of a set of training examples and each training example is represented by
an array or vector, sometimes called a feature vector, and the training data is
represented by a matrix.
● Types of supervised-learning algorithms include active learning, classification and
regression.
Classification algorithms are used when the outputs are restricted to a limited set of values, and
regression algorithms are used when the outputs may have any numerical value within a range.
Example for a classification algorithm that filters emails, the input would be an incoming email,
Decision Tree
● Decision Tree is a supervised learning technique that can be used for both
classification and regression problems, but mostly it is preferred for solving
classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision rules and
each leaf node represents the outcome.
● In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those decisions and do not
contain any further branches.
● The decisions or the test are performed on the basis of features of the given
dataset.
● It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
● A decision tree simply asks a question, and based on the answer (Yes/No),
it further split the tree into subtrees.
● The basic algorithm used in decision trees is known as the ID3 (by Quinlan)
algorithm. The ID3 algorithm builds decision trees using a top-down, greedy
approach.
Decision Tree code in Python
# Import necessary libraries
from sklearn import tree
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load the iris dataset (as an example)
iris = load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a decision tree classifier
clf = tree.DecisionTreeClassifier()
# Train the classifier on the training set
clf.fit(X_train, y_train)
# Make predictions on the test set
y_pred = clf.predict(X_test)
# Evaluate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")
OUTPUT
The output of the code will be the accuracy of the decision tree classifier on the
test set. Since the dataset and the random splitting may vary, the exact accuracy
value may differ each time we run the code. Here's an example of what the
output might look like:
OUTPUT:
Accuracy: 100.00%
Name-Ashutosh Mishra
Registration number-2002030011
TOPICS:- Decision Tree Structure, Examples,
Splitting Criteria and Process
Decision Tree Algorithm
● Decision Tree algorithm belongs to the family of supervised learning algorithms.
● Unlike other supervised learning algorithms, the decision tree algorithm can be
used for solving regression and classification problems
● The goal of using a Decision Tree is to create a training model that can use to
predict the class or value of the target variable by learning simple decision rules
inferred from training data.
● In Decision Trees, for predicting a class label for a record we start from the root
of the tree.
Types of Decision Trees
Types of decision trees are based on the type of target variable we have. It can be of two
types:
1. Categorical Variable Decision Tree: Decision Tree which has a categorical target
variable then it called a Categorical variable decision tree.
2. Continuous Variable Decision Tree: Decision Tree has a continuous target
variable then it is called Continuous Variable Decision Tree.
Important Terminology related to Decision Trees
➢ Root Node: It represents the entire population or sample and this further gets
divided into two or more homogeneous sets.
➢ Splitting: It is a process of dividing a node into two or more sub-nodes.
➢ Decision Node: When a sub-node splits into further sub-nodes, then it is called
the decision node.
➢ Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node.
➢ Pruning: When we remove sub-nodes of a decision node, this process is called
pruning. You can say the opposite process of splitting.
➢ Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-
tree.
➢ Parent and Child Node: A node, which is divided into sub-nodes is called a
parent node of sub-nodes whereas sub-nodes are the child of a parent node.
How do Decision Trees work ?
● The decision of making strategic splits heavily affects a tree’s accuracy. The
decision criteria are different for classification and regression trees.
● Decision trees use multiple algorithms to decide to split a node into two or more
sub-nodes.
● The creation of sub-nodes increases the homogeneity of resultant sub-nodes and
increases purity of the node with respect to the target variable.
● The decision tree splits the nodes on all available variables and then selects the
split which results in most homogeneous sub-nodes.
Node Splitting in a Decision Tree
● Node splitting, or simply splitting, divides a node into multiple sub-nodes to
create relatively pure nodes.
● This is done by finding the best split for a node and can be done in multiple
ways.
The ways of splitting a node can be broadly divided into two categories based on the
type of target variable:
❏ Continuous Target Variable: Reduction in Variance
❏ Categorical Target Variable: Gini Impurity, Information Gain, and Chi-Square
Example Question
Suppose there is a candidate who has a job offer and wants to decide whether he
should accept the offer or Not.
● root node (Salary)
● root node splits further into the next decision node (distance from the office)
● The next decision node further gets split into one decision node (Cab
facility) and one leaf node.
● he decision node splits into two leaf nodes (Accepted offers and Declined
offer)
Feature Selection And Feature Extraction,
Real World Problems
Name-Satyabrata Dwivedy
Registration number-2004050002
Need for reduction
● Classification of leukemia tumors from microarray gene expression
data1
○ 72 patients (data points)
○ 7130 features (expression levels of different genes)
● Text mining, document classification
○ features are words
● Quantitative Structure-Activity Relationship (QSAR)
○ features are molecular descriptors, there exist plenty of them
○ biological activity
■ an expression describing the beneficial or adverse effects of a drug on living
matter
○ Structure-Activity Relationship (SAR)
■ hypotheses that similar molecules have similar activities
○ molecular descriptor
■ mathematical procedure transforms chemical information encoded within a
symbolic representation of a molecule into a useful number
Molecular Descriptor
adjacency (connectivity) matrix
total adj. index AV – sum all aij
measure of the graph connectedness
2.18
3
Randic connectivity indices
measure of the molecular branching
QSAR
• Form a mathematical/statistical relationship (model) between structural
(physiochemical) properties and activity.
• The mathematical expression can then be used to predict the biological
response of other chemical structures.
descriptor
biological
activity
Selection vs. Extraction
● In feature selection we try to find the best subset of the input feature set.
● In feature extraction we create new features based on transformation or
combination of the original feature set.
● Both selection and extraction lead to the dimensionality reduction.
● No clear cut evidence that one of them is superior to the other on all types
of task.
Why to do it?
● We’re interested in features – we want to know which are relevant. If
we fit a model, it should be interpretable.
■ facilitate data visualization and data understanding
■ reduce experimental costs (measurements)
● We’re interested in prediction – features are not interesting in
themselves, we just want to build a good predictor.
■ faster training
■ defy the curse of dimensionality
Feature Selection
(FS)
Classification of FS methods
• Filter
– Assess the relevance of features only by looking at the intrinsic
properties of the data.
– Usually, calculate the feature relevance score and remove low-
scoring features.
• Wrapper
– Bundle the search for best model with the FS.
– Generate and evaluate various subsets of features. The
evaluation is obtained by training and testing a specific ML
model.
• Embedded
– The search for an optimal subset is built into the classifier
construction (e.g. decision trees).
Filter Methods
● Two steps (score-and-filter approach)
○ assess each feature individually for ist potential in discriminating among classes
in the data
○ features falling beyond threshold are eliminated
● Advantages:
○ easily scale to high-dimensional data
○ simple and fast
○ independent of the classification algorithm
● Disadvantages:
○ ignore the interaction with the classifier
○ most techniques are univariate (each feature is considered separately)
Scores in filter methods
Information measures
information gain
mutual information
complexity: O(d)
Distance measures
Euclidean distance
Dependence measures
Pearson correlation coefficient
χ2-test
t-test
AUC
Wrappers
● Search for the best feature subset in combination with a fixed classification
method.
● The goodness of a feature subset is determined using cross-validation (k-fold,
LOOCV)
● Advantages:
○ interaction between feature subset and model selection
○ take into account feature dependencies
○ generally more accurate
● Disadvantages:
○ higher risk of overfitting than filter methods
○ very computationally intensive
Exhaustive Search
• Evaluate all possible subsets using exhaustive search – this leads to the optimum subset.
• For a total of d variables, and subset of size p, the total number of possible subsets is
• complexity: O(2d) (exponential)
• Various strategies how to reduce the search space.
– They are still O(2d), but much faster (at least 1000-times)
– e.g. “branch and bound”
e.g. d = 100, p = 10 → ≈2×1013
Deterministic
● Sequential Forward Selection (SFS)
● Sequential Backward Selection (SBS)
● “ Plus q take away r ” Selection
● Sequential Forward Floating Search (SFFS)
● Sequential Backward Floating Search (SBFS)
Sequential Forward Selection
• SFS
• At the beginning select the best feature using a scalar
criterion function.
• Add one feature at a time which along with already
selected features maximizes the criterion function.
• A greedy algorithm, cannot retract (also called nesting
effect).
• Complexity is O(d)
Sequential Backward Selection
• SBS
• At the beginning select all d features.
• Delete one feature at a time and select the
subset which maximize the criterion function.
• Also a greedy algorithm, cannot retract.
• Complexity is O(d).
“Plus q take away r” Selection
• At first add q features by forward selection, then
discard r features by backward selection
• Need to decide optimal q and r
• No subset nesting problems Like SFS and SBS
Sequential Forward Floating
Search
• SFFS
• It is a generalized “plus q take away r” algorithm
• The value of q and r are determined
automatically
• Close to optimal solution
• Affordable computational cost
• Also in backward disguise
Embedded FS
● The feature selection process is done inside the ML
algorithm.
● Decision trees
○ In final tree, only a subset of features are used
● Regularization
○ It effectively “shuts down” unnecessary features.
○ Pruning in NN.
Feature extraction
(FE)
● FS – identify and select the “best” features with respect to the
target task.
● Selected features retain their original physical interpretation.
● FE – create new features as a transformation (combination) of
original features. Usually followed by FS.
● May provide better discriminatory ability than the best subset.
● Do not retain the original physical interpretation, may not
have clear meaning.
Principal Component
Analysis
(PCA)
x
1
x
2
x
1
x
2
Make data
to have zero
mean (i.e.
move data
into
[0, 0] point).
centering
x
1
x
2
The variability in
data is highest
along this line. It is
called 1st principal
component.
And this is 2nd
principal component.
x
1
Principal components (PC’s) are
linear combinations of original
coordinates.
The coefficients of linear
combination (w0, w1, …) are called
loadings.
In the transformed coordinate
system, individual data points have
different coordinates, these are
called scores.
w0 + w1x1 +
w2x2
w’0 + w’1x1 +
w’2x2
x2
● PCA - orthogonal linear transformation that changes the data into a new
coordinate system such that the variance is put in order from the greatest to the
least.
● Solve the problem = find new orthogonal coordinate system = find loadings
● PC’s (vectors) and their corresponding variances (scalars) are found by
eigenvalue decompositions of the covariance matrix C = XXT of the xi variables.
○ Eigenvector corresponding to the largest eigenvalue is 1st PC.
○ The 2nd eigenvector (the 2nd largest eigenvalue) is orthogonal to the 1st
one. …
● Eigenvalue decomposition is computed using standard algorithms: eigen
decomposition of covariance matrix (e.g. QR algorithm), SVD of mean centered
data matrix.
Interpretation of PCA
● New variables (PCs) have a variance equal to their
corresponding eigenvalue
Var(Yi)= λi for all I = 1…p
● Small λi ⬄ small variance ⬄ data changes little in the
direction of component Y
● The relative variance explained by each PC is given by λi /Σ λi
How many components?
● Enough PCs to have a cumulative variance explained by
the PCs that is >50-70%
● Kaiser criterion: keep PCs with eigenvalues >1
● Scree plot: represents the ability of PCs to explain de
variation in data
Topic:-Training and Testing sets
Advantages and Disadvantages of Decision Tree
Name-Smriti Panda
Registration number-2004050024
Training Set of data
● A decision tree is consistent with a training set.
• The training set is used to build the decision tree. During this phase:
❑ The algorithm selects the best attribute to split the data based on
metrics like entropy or Gini impurity.
❑ The goal is to find the attribute that maximizes information gain or
reduces impurity after the split.
❑ The decision tree is constructed by recursively partitioning the data
based on attribute values.
❑ Each node in the tree represents a split point based on an attribute.
❑ The tree grows until a stopping criterion (e.g., maximum depth or
minimum samples per leaf) is met.
Training Set of data
● Typically, for decision tree classification, the model should be
learned on training data with a predefined set of labels.
● It would predict a label (i.e., class) for new samples.
● So, we have a dataset with different attributes (features). Each
sample has its own combination of the value of the features.
Training Procedure
Training Procedure
The steps involved in System Model of training are as follows:
1. Analysis and Identification: Analyze and identify the training
needs who needs training, what do they need to learn, estimating training
cost, etc. The next step is to develop a performance measure on the basis
of which actual performance would be evaluated.
2. Designing:
Design and provide training to meet identified needs. This step requires
developing objectives of training, identifying the learning steps,
sequencing and structuring the contents.
3. Developing:
This phase requires listing the activities in the training program that will
assist the participants to learn, selecting delivery method, examining the
training material and validating information to be imparted to make sure
it accomplishes all the goals and objectives.
Training Procedure
4. Implementation: Implementing is the hardest part of the
system because one wrong step can lead to the failure of whole
training programe.
5. Evaluation: Evaluating each phase so as to make sure it has
achieved its aim in terms of subsequent work performance.
Making necessary amendments to any of the previous stage in
order to remedy or improve failure practices.
Testing Set of data
• The test set is used to evaluate the performance of the decision tree.
• After constructing the tree using the training data, you evaluate how well it
performs on unseen data.
• For each instance in the test set, you call a function (often
named classify), passing in the newly-built tree and the data point you
want to classify.
• The function returns the leaf node to which the data point belongs,
effectively assigning a class label.
• By comparing the assigned class label to the actual label, you assess the
tree’s performance.
• A common practice is to shuffle the data and allocate 80% to training and
the remaining 20% to testing.
• The training set helps the decision tree to learn, while the test set
evaluates its accuracy.
Testing Procedure
Flow Chart of training and testing model
Model-based testing
• Model-based testing is a software testing technique where the
run time behavior of the software under test is checked against
predictions made by a model.
• A model is a description of a system’s behavior.
• Behavior can be described in terms of input sequences,
actions, conditions, output, and flow of data from input to output.
• It should be practically understandable and can be reusable;
shareable must have a precise description of the system under
test.
Example: Predicting Diabetes
• To illustrate the use of decision trees, let's consider a simple example of predicting diabetes based
on certain features.
Example: Predicting Diabetes
• In this example, we used the diabetes_data.csv
dataset, which contains various features related to
diabetes, such as age, blood pressure, and glucose
level.
• The target variable, Outcome, indicates whether the
patient has diabetes (1) or not (0).
• We split the data into training and testing sets and then
built a decision tree model using the
DecisionTreeClassifier class from scikit-learn.
• Finally, we evaluated the model on the testing set and
printed the accuracy.
A simple decision tree
The figure shows that the decision tree starts from
the root node and after numerous training and
testings give leaf nodes as the result.
Advantages of Decision Tree
● Non-Parametric: Decision trees do not assume specific underlying data
distributions. This flexibility allows them to be applied to diverse problems
without worrying about model assumptions.
● Handling Categorical Values: Decision trees can naturally handle categorical
features without requiring explicit encoding or transformation.
● Minimal Data Preparation: Unlike some other algorithms, decision trees
require minimal data preprocessing. They can work directly with raw features,
reducing the need for extensive feature engineering.
● Non-Linear Models: Decision trees are inherently non-linear. They represent
piece-wise functions of various features in the feature space, making them
suitable for complex problems where linearity cannot be assumed.
Advantages of Decision Tree
• Relatively Easy to Interpret: Trained decision trees are generally
intuitive to understand. Their entire structure can be visualized as a
simple flow chart, making it easier for analysts and stakeholders to
grasp the decision-making process.
• Robust to Outliers: Well-regularized decision trees handle outliers
well. Predictions are generated by aggregating over a subsample of
training data, reducing the impact of outliers.
• Can Deal with Missing Values: The CART (Classification and
Regression Trees) algorithm naturally handles missing values.
Decision trees can be constructed without additional preprocessing to
address missing data.
• Combining Features for Predictions: Decision trees combine
decision rules (if-else conditions on input features) via AND
relationships as they traverse the tree. This enables the use of
feature combinations in making predictions.
Disadvantages of Decision Tree
● Prone to Overfitting: Decision trees can become overly complex and fit
noise in the training data, leading to poor generalization on unseen data.
● Sensitive to Noise: Decision trees can be sensitive to noisy data,
especially when the tree is deep.
● Sensitive to Changes in Data: Small changes in the training data can
significantly affect the tree’s structure, making it unstable.
● Greedy Algorithm: The tree-building process is greedy, meaning it
makes locally optimal decisions at each split without considering global
implications.
● Non-Continuous Predictions: Decision trees produce step-like
predictions, which may not be suitable for problems requiring smooth
outputs.
Building a Decision Tree: A Step-by-Step Approach
Constructing a decision tree for the "Play Golf" dataset.
Name-Swaraj Pradhan
Registration number-2002050082
Consider the table below. It represent factors that affect whether John would go out to play golf or not. Using
the data in the table, build a decision tree to model that can be used to predict if John would play golf
or not.
Figure 1: "Play Golf"
dataset
Step 1: Determine the Decision Column
➢ Since decision trees are used for classification, you need to determine the classes which are the basis for
the decision.
➢ In this case, it it the last column, that is Play Golf column with classes Yes and No.
➢ To determine the rootNode we need to compute the entropy.
➢ To do this, we create a frequency table for the classes (the Yes/No column).
Table 2: Frequency
Table
Step 2: Calculating Entropy for the classes (Play Golf)
Entropy
➢ It is the measure of impurity (or) uncertainty in the data. It lies between 0 to 1 and is calculated using
the below formula.
Entropy(PlayGolf) = E(5,9)
➢ Compute the entropy for the decision column(Play Golf) using a frequency
table
Step 3: Calculate Entropy for Other Attributes After Split (contd..)
For the other four attributes, we need to calculate the entropy after each of the split.
● E(PlayGolf, Outlook)
● E(PlayGolf, Temperature)
● E(PlayGolf, Humidity)
● E(PlayGolf,Windy)
➢ The entropy for two variables is calculated using the formula.
➢ There to calculate E(PlayGolf, Outlook), we would use the formula below:
➢ Which is the same as:
E(PlayGolf, Outlook) = P(Sunny) E(3,2) + P(Overcast) E(4,0) + P(rainy) E(2,30)
This frequency table is given below:
Table 3: Frequency Table for Outlook
Let’s go ahead to calculate E(3,2)
We would not need to calculate the second and the third terms! This is because
E(4, 0) = 0
E(2,3) = E(3,2)
E (PlayGolf, Temperature) Calculation
Table 4: Frequency Table for Temperature
E(PlayGolf, Temperature) = P(Hot) E(2,2) + P(Cold) E(3,1) + P(Mild) E(4,2)
E (PlayGolf, Humidity) Calculation
Table 5: Frequency Table for Humidity
E (PlayGolf, Windy) Calculation
Table 6: Frequency Table for Windy
Step 4: Calculating Information Gain for Each Split
● Calculate information gain for each attribute using the formula:
Gain(S, T) = Entropy(S) – Entropy(S, T).
● Then the attribute with the largest information gain is used for the split.
Gain(PlayGolf, Outlook) = Entropy(PlayGolf) – Entropy(PlayGolf, Outlook)
= 0.94 – 0.693 = 0.247
Gain(PlayGolf, Temperature) = Entropy(PlayGolf) – Entropy(PlayGolf, Temperature)
= 0.94 – 0.911 = 0.029
Gain(PlayGolf, Humidity) = Entropy(PlayGolf) – Entropy(PlayGolf, Humidity)
= 0.94 – 0.788 = 0.152
Gain(PlayGolf, Windy) = Entropy(PlayGolf) – Entropy(PlayGolf, Windy)
= 0.94 – 0.892 = 0.048
Step 5: Perform the First Split (contd..)
➢ Now that we have all the information gain, we then split the tree based on the attribute with the
highest information gain.
➢ From our calculation, the highest information gain comes from Outlook. Therefore the split will
look like this:
Figure 2: Decision Tree after first split
➢ Now that we have the first stage of the decision tree, we see that we have one leaf node. But we still
need to split the tree further.
➢ To do that, we need to also split the original table to create sub tables. This sub tables are given in below.
➢ NOTE :- From Table 3, we could see that the Overcast outlook requires no further split because it is just one
homogeneous group. So we have a leaf node.
Step 6: Perform Further Splits (contd..)
➢ The Sunny and the Rainy attributes needs to be split.
➢ The Rainy outlook can be split using either Temperature, Humidity or Windy.
➢ Humidity attribute would best be used for this split because it produces homogenous
groups.
Table 8: Split using Humidity
➢ The Rainy attribute could be split using High and Normal attributes and that would give us the tree
below.
Figure 3: Split using the Humidity Attribute
➢ The Sunny outlook can be split using either Temperature, Humidity or Windy.
➢ Windy attribute would best be used for this split because it produces homogeneous groups.
Table 9: Split using Windy Attribute
➢ NOTE:- If we do the split using the Windy attribute, we would have the final tree that would require
no further splitting! This is shown in Figure 4
Step 7: Complete the Decision Tree
The final decision tree is constructed with leaf nodes representing the decision classes (Yes/No).
Figure 4: Final Decision Tree
DECISION TREE ALGORITHM
Name-Bishnu Prasad Sahoo
Registration number-2002050074
BASIC ALGORITHM OF DECISION TREE
Basic algorithm (a greedy algorithm)
● Tree is constructed in a top-down recursive divide-and-conquer manner
● At start, all the training examples are at the root
● Attributes are categorical (if continuous-valued, they are discretized in
advance)
● Examples are partitioned recursively based on selected attributes
● Test attributes are selected on the basis of a heuristic or statistical measure
(e.g., information gain)
Conditions for stopping partitioning
● All samples for a given node belong to the same class
● There are no remaining attributes for further partitioning - majority voting is
employed for classifying the leaf
● There are no samples left
DECISION TREE ALGORITHM
ID3 algorithm
C4.5 algorithm
● A successor of ID3
● Became a benchmark to which newer supervised learning algorithms are
often compared.
● Commercial successor: C5.0
CART (Classification and Regression Trees) algorithm
● The generation of binary decision trees
● Developed by a group of statisticians
ID3 is a strong system that
● Uses hill-climbing search based on the information gain measure to search
through the space of decision trees
● Outputs a single hypothesis
● Never backtracks. It converges to locally optimal solutions
● Uses all training examples at each step, contrary to methods that make
decisions incrementally
● Uses statistical properties of all examples:the search is less sensitive to
errors in individual training examples
However, ID3 has some drawbacks
● It can only deal with nominal data( e.g. continuous data)
● It may be not robust in presence of noise (e.g. overfitting)
● It is not able to deal with noisy data sets (e.g. missing values)
ID3
CART(Classification And Regression Trees)
● Developed by Breiman, Friedman, Olshen, Stone in early 80’s.
● Introduced tree based modelling into the statistical mainstream.
● Rigorous approach involving cross validation to select the optimal tree
C4.5
● Be robust in the presence of noise.
● Avoid overfittin.
● Deal with continuous attributes.
● Deal with missing data.
● Convert trees to rules.
Overfitting and Tree Pruning
Overfitting: An induced tree may overfit the training data
● Too many branches, some may reflect anomalies due to noise or outliers
● Poor accuracy for unseen samples
Two approaches to avoid overfitting
➢ Prepruning: Halt tree construction early-do not split a node if this would result in the
goodness measure falling below a threshold
● Difficult to choose an appropriate threshold
➢ Postpruning: Remove branches from a "fully grown" tree-get a sequence of
progressively pruned trees
● Use a set of data different from the training data to decide which is the "best pruned
tree"
Tree Pruning
Cost Complexity pruning
● Post pruning approach used in CART
● Cost complexity - function of number of leaves and error rate of the tree
● For each internal node cost complexity is calculated wrt original and pruned
versions
● If pruning results in a smaller cost complexity - subtree is pruned
● Uses a separate prune set
Pessimistic Pruning
● Uses training set and adjusts error rates by adding a penalty
Minimum Description Length (MDL) principle
Issues: Repetition and Replication
TOPIC:- EFFECTIVE DECISION TREE AND CONCLUSION
Name-Subham Saurava Panda
Registration number-2002050124
MISSING DATA IN DECISION TREE
Missing data can arise due to various reasons such as incomplete data collection, sensor
malfunctions, or participants opting not to provide certain information. Dealing with missing data in
decision trees involves making decisions at nodes even when some values are missing.
(I) In a decision tree, when a node is being split based on a feature, instances with missing
values for that feature can still be assigned to one of the branches. The decision tree algorithm
considers other available features to determine the best split.
(II) Decision tree algorithms typically include rules for handling instances with missing values.
These rules guide the placement of instances with missing data during the tree-building process.
(III) Some decision tree algorithms may perform automatic imputation for missing values.
Imputation involves estimating or replacing missing values with a predicted or calculated value.
EFFECTIVE DECISION TREE
Some key characteristics of an effective decision tree
Interpretability: Decision trees are inherently interpretable. The structure of the tree, with nodes representing
decisions and branches representing outcomes, is easy to understand, making it a valuable tool for explaining
predictions to non-experts.
Simplicity: Effective decision trees are designed to be relatively simple. They aim to capture the essential
patterns in the data without unnecessary complexity. Simplicity helps with both understanding and
implementation.
Versatility:Decision trees can be applied to various types of problems, including classification and regression
tasks. They are capable of handling both categorical and numerical data, making them versatile for a wide range
of applications.
Cont.
Handling Missing Values: Effective decision trees can handle datasets with missing values. They have
mechanisms for making decisions at nodes even when certain values are missing, ensuring that available
information is still utilized.
Pruning for Generalization: Effective decision trees often undergo pruning, a process that removes
unnecessary branches. Pruning helps prevent overfitting, allowing the model to generalize well to new, unseen
data.
Scalability: Decision trees are computationally efficient and can handle datasets of varying sizes. This
scalability makes them suitable for applications in both small and large data environments.
Non-Linearity: Decision trees can model non-linear relationships in the data, allowing them to capture complex
patterns and interactions between variables. This is especially beneficial when the relationships are not easily
represented by linear models.
CONCLUSION
In conclusion, decision trees are powerful tools in the realm of data-driven decision-making. Their
simplicity, interpretability, and versatility make them valuable in various domains, from business
and finance to healthcare and beyond. Effective decision trees exhibit characteristics such as the
ability to handle both categorical and numerical data, automatic feature selection, scalability, and
the capacity to model non-linear relationships.
Their transparency allows decision-makers to understand and trust the decision-making process,
fostering confidence in the model's predictions. The feature importance insights provided by
decision trees contribute to a deeper understanding of the factors influencing outcomes.
CONCLUSION (Cont.)
Furthermore, the adaptability of decision trees to handle missing values and their
incorporation into ensemble methods like Random Forests enhance their robustness and
predictive performance. With proper pruning techniques, decision trees can generalize well
to new data, preventing overfitting.
Ultimately, decision trees offer a clear and accessible framework for navigating complex
decision spaces, making them an indispensable tool for those seeking actionable insights
and informed choices from their data.
THANK
YOU

More Related Content

What's hot

Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighborUjjawal
 
Feature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisFeature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisSayed Abulhasan Quadri
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Randomized algorithm min cut problem and its solution using karger's algorithm
Randomized algorithm min cut problem and its solution using karger's algorithmRandomized algorithm min cut problem and its solution using karger's algorithm
Randomized algorithm min cut problem and its solution using karger's algorithmGaurang Savaliya
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3Laila Fatehy
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxUnit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxDr.Shweta
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSpotle.ai
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and ldaSuresh Pokharel
 

What's hot (20)

Decision tree
Decision treeDecision tree
Decision tree
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Feature Extraction and Principal Component Analysis
Feature Extraction and Principal Component AnalysisFeature Extraction and Principal Component Analysis
Feature Extraction and Principal Component Analysis
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Randomized algorithm min cut problem and its solution using karger's algorithm
Randomized algorithm min cut problem and its solution using karger's algorithmRandomized algorithm min cut problem and its solution using karger's algorithm
Randomized algorithm min cut problem and its solution using karger's algorithm
 
Decision tree
Decision treeDecision tree
Decision tree
 
Greedy algorithms
Greedy algorithmsGreedy algorithms
Greedy algorithms
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Classification Using Decision tree
Classification Using Decision treeClassification Using Decision tree
Classification Using Decision tree
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxUnit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptx
 
Supervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine LearningSupervised and Unsupervised Machine Learning
Supervised and Unsupervised Machine Learning
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 
Regression
RegressionRegression
Regression
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 

Similar to Decision Tree Machine Learning Detailed Explanation.

Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5ssuser33da69
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3Luis Borbon
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Diabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine LearningDiabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine Learningjagan477830
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxAsrithaKorupolu
 
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...Universitas Pembangunan Panca Budi
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data miningEr. Nawaraj Bhandari
 
Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)eSAT Journals
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive ModellingAmit Kumar
 
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET Journal
 
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...Waqas Tariq
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptxssuser6654de1
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
 
Data Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfData Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfJayanti Pande
 

Similar to Decision Tree Machine Learning Detailed Explanation. (20)

Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
 
Decision tress
  Decision tress  Decision tress
Decision tress
 
Machine learning - session 3
Machine learning - session 3Machine learning - session 3
Machine learning - session 3
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Diabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine LearningDiabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine Learning
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
Technique for Order Preference by Similarity to Ideal Solution as Decision Su...
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)Efficient classification of big data using vfdt (very fast decision tree)
Efficient classification of big data using vfdt (very fast decision tree)
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
 
Internship project report,Predictive Modelling
Internship project report,Predictive ModellingInternship project report,Predictive Modelling
Internship project report,Predictive Modelling
 
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and ApplicationsIRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
IRJET - A Survey on Machine Learning Algorithms, Techniques and Applications
 
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
Decision Tree Classifiers to determine the patient’s Post-operative Recovery ...
 
5. Machine Learning.pptx
5.  Machine Learning.pptx5.  Machine Learning.pptx
5. Machine Learning.pptx
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
 
Data Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdfData Mining Module 3 Business Analtics..pdf
Data Mining Module 3 Business Analtics..pdf
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 

Recently uploaded

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 

Recently uploaded (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 

Decision Tree Machine Learning Detailed Explanation.

  • 1. DECISION TREE Submitted to: Gyanaranjan Shial Assistant Professor Veer Surendra Sai University of Technology, Burla
  • 2. SUBMITTED BY: - SL NO. NAME REGISTRATION NO PAGE NO 1 SASWATI DAS CHOUDHURY 2002030149 4 - 15 2 ASHUTOSH MISHRA 2002030011 16 - 25 3 SATYABRATA DWIVEDY 2004050002 26 - 49 4 SMRITI PANDA 2004050024 50 - 65 5 SWARAJ PRADHAN 2002050082 66 - 82 6 BISHNU PRASAD SAHOO 2002050074 83 - 90 7 SUBHAM SAURAV PANDA 2002050124 91 - 96
  • 3. Content ● Machine Learning ● Decision Tree Overview ● Examples, Splitting Criteria and Process ● Feature Selection and extraction, real world problems ● Training and Testing data set ● Advantages and disadvantages of Decision Tree ● Building Decision Tree ● Decision Tree Algorithms ● Missing Data, Effective decision tree ● Conclusion
  • 4. Name-Saswati Das Choudhury Registration number-2002030149 TOPICS:Machine Learning,Supervised Learning ,Decision Tree Overview and its code in Python
  • 5. Machine Learning ● Machine learning(ML) investigates how computers can learn based on data.ML approaches have been applied to large language models, computer vision, speech recognition, email filtering, agriculture and medicine. ● The term machine learning was coined in 1959 by Arthur Samuel. The synonym self- teaching computers was also used in this time period. ● Machine learning and data mining often employ the same methods and overlap significantly, but while ML focuses on prediction, based on known properties learned from the training data, data mining focuses on the discovery of (previously) unknown properties in the data.
  • 6. ● Machine learning also employs data mining methods as "unsupervised learning" or as a preprocessing step to improve learner accuracy. ● Modern-day machine learning has two objectives, one is to classify data based on models which have been developed, the other purpose is to make predictions for future outcomes based on these models. ● The mathematical foundations of ML are provided by mathematical optimization (mathematical programming) methods. ● Machine learning approaches are traditionally divided into three broad categories supervised learning, unsupervised learning, reinforcement learning.
  • 7.
  • 8. Supervised Learning ● Supervised learning algorithms build a mathematical model of a set of data that contains both the inputs and the desired outputs.The data is known as training data, and consists of a set of training examples and each training example is represented by an array or vector, sometimes called a feature vector, and the training data is represented by a matrix. ● Types of supervised-learning algorithms include active learning, classification and regression. Classification algorithms are used when the outputs are restricted to a limited set of values, and regression algorithms are used when the outputs may have any numerical value within a range. Example for a classification algorithm that filters emails, the input would be an incoming email,
  • 9.
  • 10. Decision Tree ● Decision Tree is a supervised learning technique that can be used for both classification and regression problems, but mostly it is preferred for solving classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. ● In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches.
  • 11. ● The decisions or the test are performed on the basis of features of the given dataset. ● It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions. ● A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees. ● The basic algorithm used in decision trees is known as the ID3 (by Quinlan) algorithm. The ID3 algorithm builds decision trees using a top-down, greedy approach.
  • 12.
  • 13. Decision Tree code in Python # Import necessary libraries from sklearn import tree from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load the iris dataset (as an example) iris = load_iris() X = iris.data y = iris.target # Split the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
  • 14. # Create a decision tree classifier clf = tree.DecisionTreeClassifier() # Train the classifier on the training set clf.fit(X_train, y_train) # Make predictions on the test set y_pred = clf.predict(X_test) # Evaluate the accuracy of the model accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy * 100:.2f}%")
  • 15. OUTPUT The output of the code will be the accuracy of the decision tree classifier on the test set. Since the dataset and the random splitting may vary, the exact accuracy value may differ each time we run the code. Here's an example of what the output might look like: OUTPUT: Accuracy: 100.00%
  • 16. Name-Ashutosh Mishra Registration number-2002030011 TOPICS:- Decision Tree Structure, Examples, Splitting Criteria and Process
  • 17. Decision Tree Algorithm ● Decision Tree algorithm belongs to the family of supervised learning algorithms. ● Unlike other supervised learning algorithms, the decision tree algorithm can be used for solving regression and classification problems ● The goal of using a Decision Tree is to create a training model that can use to predict the class or value of the target variable by learning simple decision rules inferred from training data. ● In Decision Trees, for predicting a class label for a record we start from the root of the tree.
  • 18. Types of Decision Trees Types of decision trees are based on the type of target variable we have. It can be of two types: 1. Categorical Variable Decision Tree: Decision Tree which has a categorical target variable then it called a Categorical variable decision tree. 2. Continuous Variable Decision Tree: Decision Tree has a continuous target variable then it is called Continuous Variable Decision Tree.
  • 19.
  • 20. Important Terminology related to Decision Trees ➢ Root Node: It represents the entire population or sample and this further gets divided into two or more homogeneous sets. ➢ Splitting: It is a process of dividing a node into two or more sub-nodes. ➢ Decision Node: When a sub-node splits into further sub-nodes, then it is called the decision node. ➢ Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node.
  • 21. ➢ Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say the opposite process of splitting. ➢ Branch / Sub-Tree: A subsection of the entire tree is called branch or sub- tree. ➢ Parent and Child Node: A node, which is divided into sub-nodes is called a parent node of sub-nodes whereas sub-nodes are the child of a parent node.
  • 22. How do Decision Trees work ? ● The decision of making strategic splits heavily affects a tree’s accuracy. The decision criteria are different for classification and regression trees. ● Decision trees use multiple algorithms to decide to split a node into two or more sub-nodes. ● The creation of sub-nodes increases the homogeneity of resultant sub-nodes and increases purity of the node with respect to the target variable. ● The decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous sub-nodes.
  • 23. Node Splitting in a Decision Tree ● Node splitting, or simply splitting, divides a node into multiple sub-nodes to create relatively pure nodes. ● This is done by finding the best split for a node and can be done in multiple ways. The ways of splitting a node can be broadly divided into two categories based on the type of target variable: ❏ Continuous Target Variable: Reduction in Variance ❏ Categorical Target Variable: Gini Impurity, Information Gain, and Chi-Square
  • 24. Example Question Suppose there is a candidate who has a job offer and wants to decide whether he should accept the offer or Not. ● root node (Salary) ● root node splits further into the next decision node (distance from the office) ● The next decision node further gets split into one decision node (Cab facility) and one leaf node. ● he decision node splits into two leaf nodes (Accepted offers and Declined offer)
  • 25.
  • 26. Feature Selection And Feature Extraction, Real World Problems Name-Satyabrata Dwivedy Registration number-2004050002
  • 27. Need for reduction ● Classification of leukemia tumors from microarray gene expression data1 ○ 72 patients (data points) ○ 7130 features (expression levels of different genes) ● Text mining, document classification ○ features are words ● Quantitative Structure-Activity Relationship (QSAR) ○ features are molecular descriptors, there exist plenty of them ○ biological activity ■ an expression describing the beneficial or adverse effects of a drug on living matter ○ Structure-Activity Relationship (SAR) ■ hypotheses that similar molecules have similar activities ○ molecular descriptor ■ mathematical procedure transforms chemical information encoded within a symbolic representation of a molecule into a useful number
  • 28. Molecular Descriptor adjacency (connectivity) matrix total adj. index AV – sum all aij measure of the graph connectedness 2.18 3 Randic connectivity indices measure of the molecular branching
  • 29. QSAR • Form a mathematical/statistical relationship (model) between structural (physiochemical) properties and activity. • The mathematical expression can then be used to predict the biological response of other chemical structures. descriptor biological activity
  • 30. Selection vs. Extraction ● In feature selection we try to find the best subset of the input feature set. ● In feature extraction we create new features based on transformation or combination of the original feature set. ● Both selection and extraction lead to the dimensionality reduction. ● No clear cut evidence that one of them is superior to the other on all types of task. Why to do it? ● We’re interested in features – we want to know which are relevant. If we fit a model, it should be interpretable. ■ facilitate data visualization and data understanding ■ reduce experimental costs (measurements) ● We’re interested in prediction – features are not interesting in themselves, we just want to build a good predictor. ■ faster training ■ defy the curse of dimensionality
  • 32. Classification of FS methods • Filter – Assess the relevance of features only by looking at the intrinsic properties of the data. – Usually, calculate the feature relevance score and remove low- scoring features. • Wrapper – Bundle the search for best model with the FS. – Generate and evaluate various subsets of features. The evaluation is obtained by training and testing a specific ML model. • Embedded – The search for an optimal subset is built into the classifier construction (e.g. decision trees).
  • 33. Filter Methods ● Two steps (score-and-filter approach) ○ assess each feature individually for ist potential in discriminating among classes in the data ○ features falling beyond threshold are eliminated ● Advantages: ○ easily scale to high-dimensional data ○ simple and fast ○ independent of the classification algorithm ● Disadvantages: ○ ignore the interaction with the classifier ○ most techniques are univariate (each feature is considered separately) Scores in filter methods Information measures information gain mutual information complexity: O(d) Distance measures Euclidean distance Dependence measures Pearson correlation coefficient χ2-test t-test AUC
  • 34. Wrappers ● Search for the best feature subset in combination with a fixed classification method. ● The goodness of a feature subset is determined using cross-validation (k-fold, LOOCV) ● Advantages: ○ interaction between feature subset and model selection ○ take into account feature dependencies ○ generally more accurate ● Disadvantages: ○ higher risk of overfitting than filter methods ○ very computationally intensive
  • 35. Exhaustive Search • Evaluate all possible subsets using exhaustive search – this leads to the optimum subset. • For a total of d variables, and subset of size p, the total number of possible subsets is • complexity: O(2d) (exponential) • Various strategies how to reduce the search space. – They are still O(2d), but much faster (at least 1000-times) – e.g. “branch and bound” e.g. d = 100, p = 10 → ≈2×1013
  • 36. Deterministic ● Sequential Forward Selection (SFS) ● Sequential Backward Selection (SBS) ● “ Plus q take away r ” Selection ● Sequential Forward Floating Search (SFFS) ● Sequential Backward Floating Search (SBFS)
  • 37. Sequential Forward Selection • SFS • At the beginning select the best feature using a scalar criterion function. • Add one feature at a time which along with already selected features maximizes the criterion function. • A greedy algorithm, cannot retract (also called nesting effect). • Complexity is O(d)
  • 38. Sequential Backward Selection • SBS • At the beginning select all d features. • Delete one feature at a time and select the subset which maximize the criterion function. • Also a greedy algorithm, cannot retract. • Complexity is O(d).
  • 39. “Plus q take away r” Selection • At first add q features by forward selection, then discard r features by backward selection • Need to decide optimal q and r • No subset nesting problems Like SFS and SBS
  • 40. Sequential Forward Floating Search • SFFS • It is a generalized “plus q take away r” algorithm • The value of q and r are determined automatically • Close to optimal solution • Affordable computational cost • Also in backward disguise
  • 41. Embedded FS ● The feature selection process is done inside the ML algorithm. ● Decision trees ○ In final tree, only a subset of features are used ● Regularization ○ It effectively “shuts down” unnecessary features. ○ Pruning in NN.
  • 43. ● FS – identify and select the “best” features with respect to the target task. ● Selected features retain their original physical interpretation. ● FE – create new features as a transformation (combination) of original features. Usually followed by FS. ● May provide better discriminatory ability than the best subset. ● Do not retain the original physical interpretation, may not have clear meaning.
  • 45. x 1 x 2 x 1 x 2 Make data to have zero mean (i.e. move data into [0, 0] point). centering
  • 46. x 1 x 2 The variability in data is highest along this line. It is called 1st principal component. And this is 2nd principal component.
  • 47. x 1 Principal components (PC’s) are linear combinations of original coordinates. The coefficients of linear combination (w0, w1, …) are called loadings. In the transformed coordinate system, individual data points have different coordinates, these are called scores. w0 + w1x1 + w2x2 w’0 + w’1x1 + w’2x2 x2
  • 48. ● PCA - orthogonal linear transformation that changes the data into a new coordinate system such that the variance is put in order from the greatest to the least. ● Solve the problem = find new orthogonal coordinate system = find loadings ● PC’s (vectors) and their corresponding variances (scalars) are found by eigenvalue decompositions of the covariance matrix C = XXT of the xi variables. ○ Eigenvector corresponding to the largest eigenvalue is 1st PC. ○ The 2nd eigenvector (the 2nd largest eigenvalue) is orthogonal to the 1st one. … ● Eigenvalue decomposition is computed using standard algorithms: eigen decomposition of covariance matrix (e.g. QR algorithm), SVD of mean centered data matrix.
  • 49. Interpretation of PCA ● New variables (PCs) have a variance equal to their corresponding eigenvalue Var(Yi)= λi for all I = 1…p ● Small λi ⬄ small variance ⬄ data changes little in the direction of component Y ● The relative variance explained by each PC is given by λi /Σ λi How many components? ● Enough PCs to have a cumulative variance explained by the PCs that is >50-70% ● Kaiser criterion: keep PCs with eigenvalues >1 ● Scree plot: represents the ability of PCs to explain de variation in data
  • 50. Topic:-Training and Testing sets Advantages and Disadvantages of Decision Tree Name-Smriti Panda Registration number-2004050024
  • 51. Training Set of data ● A decision tree is consistent with a training set. • The training set is used to build the decision tree. During this phase: ❑ The algorithm selects the best attribute to split the data based on metrics like entropy or Gini impurity. ❑ The goal is to find the attribute that maximizes information gain or reduces impurity after the split. ❑ The decision tree is constructed by recursively partitioning the data based on attribute values. ❑ Each node in the tree represents a split point based on an attribute. ❑ The tree grows until a stopping criterion (e.g., maximum depth or minimum samples per leaf) is met.
  • 52. Training Set of data ● Typically, for decision tree classification, the model should be learned on training data with a predefined set of labels. ● It would predict a label (i.e., class) for new samples. ● So, we have a dataset with different attributes (features). Each sample has its own combination of the value of the features.
  • 54. Training Procedure The steps involved in System Model of training are as follows: 1. Analysis and Identification: Analyze and identify the training needs who needs training, what do they need to learn, estimating training cost, etc. The next step is to develop a performance measure on the basis of which actual performance would be evaluated. 2. Designing: Design and provide training to meet identified needs. This step requires developing objectives of training, identifying the learning steps, sequencing and structuring the contents. 3. Developing: This phase requires listing the activities in the training program that will assist the participants to learn, selecting delivery method, examining the training material and validating information to be imparted to make sure it accomplishes all the goals and objectives.
  • 55. Training Procedure 4. Implementation: Implementing is the hardest part of the system because one wrong step can lead to the failure of whole training programe. 5. Evaluation: Evaluating each phase so as to make sure it has achieved its aim in terms of subsequent work performance. Making necessary amendments to any of the previous stage in order to remedy or improve failure practices.
  • 56. Testing Set of data • The test set is used to evaluate the performance of the decision tree. • After constructing the tree using the training data, you evaluate how well it performs on unseen data. • For each instance in the test set, you call a function (often named classify), passing in the newly-built tree and the data point you want to classify. • The function returns the leaf node to which the data point belongs, effectively assigning a class label. • By comparing the assigned class label to the actual label, you assess the tree’s performance. • A common practice is to shuffle the data and allocate 80% to training and the remaining 20% to testing. • The training set helps the decision tree to learn, while the test set evaluates its accuracy.
  • 58. Flow Chart of training and testing model
  • 59. Model-based testing • Model-based testing is a software testing technique where the run time behavior of the software under test is checked against predictions made by a model. • A model is a description of a system’s behavior. • Behavior can be described in terms of input sequences, actions, conditions, output, and flow of data from input to output. • It should be practically understandable and can be reusable; shareable must have a precise description of the system under test.
  • 60. Example: Predicting Diabetes • To illustrate the use of decision trees, let's consider a simple example of predicting diabetes based on certain features.
  • 61. Example: Predicting Diabetes • In this example, we used the diabetes_data.csv dataset, which contains various features related to diabetes, such as age, blood pressure, and glucose level. • The target variable, Outcome, indicates whether the patient has diabetes (1) or not (0). • We split the data into training and testing sets and then built a decision tree model using the DecisionTreeClassifier class from scikit-learn. • Finally, we evaluated the model on the testing set and printed the accuracy.
  • 62. A simple decision tree The figure shows that the decision tree starts from the root node and after numerous training and testings give leaf nodes as the result.
  • 63. Advantages of Decision Tree ● Non-Parametric: Decision trees do not assume specific underlying data distributions. This flexibility allows them to be applied to diverse problems without worrying about model assumptions. ● Handling Categorical Values: Decision trees can naturally handle categorical features without requiring explicit encoding or transformation. ● Minimal Data Preparation: Unlike some other algorithms, decision trees require minimal data preprocessing. They can work directly with raw features, reducing the need for extensive feature engineering. ● Non-Linear Models: Decision trees are inherently non-linear. They represent piece-wise functions of various features in the feature space, making them suitable for complex problems where linearity cannot be assumed.
  • 64. Advantages of Decision Tree • Relatively Easy to Interpret: Trained decision trees are generally intuitive to understand. Their entire structure can be visualized as a simple flow chart, making it easier for analysts and stakeholders to grasp the decision-making process. • Robust to Outliers: Well-regularized decision trees handle outliers well. Predictions are generated by aggregating over a subsample of training data, reducing the impact of outliers. • Can Deal with Missing Values: The CART (Classification and Regression Trees) algorithm naturally handles missing values. Decision trees can be constructed without additional preprocessing to address missing data. • Combining Features for Predictions: Decision trees combine decision rules (if-else conditions on input features) via AND relationships as they traverse the tree. This enables the use of feature combinations in making predictions.
  • 65. Disadvantages of Decision Tree ● Prone to Overfitting: Decision trees can become overly complex and fit noise in the training data, leading to poor generalization on unseen data. ● Sensitive to Noise: Decision trees can be sensitive to noisy data, especially when the tree is deep. ● Sensitive to Changes in Data: Small changes in the training data can significantly affect the tree’s structure, making it unstable. ● Greedy Algorithm: The tree-building process is greedy, meaning it makes locally optimal decisions at each split without considering global implications. ● Non-Continuous Predictions: Decision trees produce step-like predictions, which may not be suitable for problems requiring smooth outputs.
  • 66. Building a Decision Tree: A Step-by-Step Approach Constructing a decision tree for the "Play Golf" dataset. Name-Swaraj Pradhan Registration number-2002050082
  • 67. Consider the table below. It represent factors that affect whether John would go out to play golf or not. Using the data in the table, build a decision tree to model that can be used to predict if John would play golf or not. Figure 1: "Play Golf" dataset
  • 68. Step 1: Determine the Decision Column ➢ Since decision trees are used for classification, you need to determine the classes which are the basis for the decision. ➢ In this case, it it the last column, that is Play Golf column with classes Yes and No. ➢ To determine the rootNode we need to compute the entropy. ➢ To do this, we create a frequency table for the classes (the Yes/No column). Table 2: Frequency Table
  • 69. Step 2: Calculating Entropy for the classes (Play Golf) Entropy ➢ It is the measure of impurity (or) uncertainty in the data. It lies between 0 to 1 and is calculated using the below formula. Entropy(PlayGolf) = E(5,9) ➢ Compute the entropy for the decision column(Play Golf) using a frequency table
  • 70. Step 3: Calculate Entropy for Other Attributes After Split (contd..) For the other four attributes, we need to calculate the entropy after each of the split. ● E(PlayGolf, Outlook) ● E(PlayGolf, Temperature) ● E(PlayGolf, Humidity) ● E(PlayGolf,Windy) ➢ The entropy for two variables is calculated using the formula. ➢ There to calculate E(PlayGolf, Outlook), we would use the formula below: ➢ Which is the same as: E(PlayGolf, Outlook) = P(Sunny) E(3,2) + P(Overcast) E(4,0) + P(rainy) E(2,30)
  • 71. This frequency table is given below: Table 3: Frequency Table for Outlook Let’s go ahead to calculate E(3,2) We would not need to calculate the second and the third terms! This is because E(4, 0) = 0 E(2,3) = E(3,2)
  • 72.
  • 73. E (PlayGolf, Temperature) Calculation Table 4: Frequency Table for Temperature E(PlayGolf, Temperature) = P(Hot) E(2,2) + P(Cold) E(3,1) + P(Mild) E(4,2)
  • 74. E (PlayGolf, Humidity) Calculation Table 5: Frequency Table for Humidity
  • 75. E (PlayGolf, Windy) Calculation Table 6: Frequency Table for Windy
  • 76. Step 4: Calculating Information Gain for Each Split ● Calculate information gain for each attribute using the formula: Gain(S, T) = Entropy(S) – Entropy(S, T). ● Then the attribute with the largest information gain is used for the split. Gain(PlayGolf, Outlook) = Entropy(PlayGolf) – Entropy(PlayGolf, Outlook) = 0.94 – 0.693 = 0.247 Gain(PlayGolf, Temperature) = Entropy(PlayGolf) – Entropy(PlayGolf, Temperature) = 0.94 – 0.911 = 0.029 Gain(PlayGolf, Humidity) = Entropy(PlayGolf) – Entropy(PlayGolf, Humidity) = 0.94 – 0.788 = 0.152 Gain(PlayGolf, Windy) = Entropy(PlayGolf) – Entropy(PlayGolf, Windy) = 0.94 – 0.892 = 0.048
  • 77. Step 5: Perform the First Split (contd..) ➢ Now that we have all the information gain, we then split the tree based on the attribute with the highest information gain. ➢ From our calculation, the highest information gain comes from Outlook. Therefore the split will look like this: Figure 2: Decision Tree after first split
  • 78. ➢ Now that we have the first stage of the decision tree, we see that we have one leaf node. But we still need to split the tree further. ➢ To do that, we need to also split the original table to create sub tables. This sub tables are given in below. ➢ NOTE :- From Table 3, we could see that the Overcast outlook requires no further split because it is just one homogeneous group. So we have a leaf node.
  • 79. Step 6: Perform Further Splits (contd..) ➢ The Sunny and the Rainy attributes needs to be split. ➢ The Rainy outlook can be split using either Temperature, Humidity or Windy. ➢ Humidity attribute would best be used for this split because it produces homogenous groups. Table 8: Split using Humidity
  • 80. ➢ The Rainy attribute could be split using High and Normal attributes and that would give us the tree below. Figure 3: Split using the Humidity Attribute
  • 81. ➢ The Sunny outlook can be split using either Temperature, Humidity or Windy. ➢ Windy attribute would best be used for this split because it produces homogeneous groups. Table 9: Split using Windy Attribute ➢ NOTE:- If we do the split using the Windy attribute, we would have the final tree that would require no further splitting! This is shown in Figure 4
  • 82. Step 7: Complete the Decision Tree The final decision tree is constructed with leaf nodes representing the decision classes (Yes/No). Figure 4: Final Decision Tree
  • 83. DECISION TREE ALGORITHM Name-Bishnu Prasad Sahoo Registration number-2002050074
  • 84. BASIC ALGORITHM OF DECISION TREE Basic algorithm (a greedy algorithm) ● Tree is constructed in a top-down recursive divide-and-conquer manner ● At start, all the training examples are at the root ● Attributes are categorical (if continuous-valued, they are discretized in advance) ● Examples are partitioned recursively based on selected attributes ● Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) Conditions for stopping partitioning ● All samples for a given node belong to the same class ● There are no remaining attributes for further partitioning - majority voting is employed for classifying the leaf ● There are no samples left
  • 85. DECISION TREE ALGORITHM ID3 algorithm C4.5 algorithm ● A successor of ID3 ● Became a benchmark to which newer supervised learning algorithms are often compared. ● Commercial successor: C5.0 CART (Classification and Regression Trees) algorithm ● The generation of binary decision trees ● Developed by a group of statisticians
  • 86. ID3 is a strong system that ● Uses hill-climbing search based on the information gain measure to search through the space of decision trees ● Outputs a single hypothesis ● Never backtracks. It converges to locally optimal solutions ● Uses all training examples at each step, contrary to methods that make decisions incrementally ● Uses statistical properties of all examples:the search is less sensitive to errors in individual training examples However, ID3 has some drawbacks ● It can only deal with nominal data( e.g. continuous data) ● It may be not robust in presence of noise (e.g. overfitting) ● It is not able to deal with noisy data sets (e.g. missing values) ID3
  • 87.
  • 88. CART(Classification And Regression Trees) ● Developed by Breiman, Friedman, Olshen, Stone in early 80’s. ● Introduced tree based modelling into the statistical mainstream. ● Rigorous approach involving cross validation to select the optimal tree C4.5 ● Be robust in the presence of noise. ● Avoid overfittin. ● Deal with continuous attributes. ● Deal with missing data. ● Convert trees to rules.
  • 89. Overfitting and Tree Pruning Overfitting: An induced tree may overfit the training data ● Too many branches, some may reflect anomalies due to noise or outliers ● Poor accuracy for unseen samples Two approaches to avoid overfitting ➢ Prepruning: Halt tree construction early-do not split a node if this would result in the goodness measure falling below a threshold ● Difficult to choose an appropriate threshold ➢ Postpruning: Remove branches from a "fully grown" tree-get a sequence of progressively pruned trees ● Use a set of data different from the training data to decide which is the "best pruned tree"
  • 90. Tree Pruning Cost Complexity pruning ● Post pruning approach used in CART ● Cost complexity - function of number of leaves and error rate of the tree ● For each internal node cost complexity is calculated wrt original and pruned versions ● If pruning results in a smaller cost complexity - subtree is pruned ● Uses a separate prune set Pessimistic Pruning ● Uses training set and adjusts error rates by adding a penalty Minimum Description Length (MDL) principle Issues: Repetition and Replication
  • 91. TOPIC:- EFFECTIVE DECISION TREE AND CONCLUSION Name-Subham Saurava Panda Registration number-2002050124
  • 92. MISSING DATA IN DECISION TREE Missing data can arise due to various reasons such as incomplete data collection, sensor malfunctions, or participants opting not to provide certain information. Dealing with missing data in decision trees involves making decisions at nodes even when some values are missing. (I) In a decision tree, when a node is being split based on a feature, instances with missing values for that feature can still be assigned to one of the branches. The decision tree algorithm considers other available features to determine the best split. (II) Decision tree algorithms typically include rules for handling instances with missing values. These rules guide the placement of instances with missing data during the tree-building process. (III) Some decision tree algorithms may perform automatic imputation for missing values. Imputation involves estimating or replacing missing values with a predicted or calculated value.
  • 93. EFFECTIVE DECISION TREE Some key characteristics of an effective decision tree Interpretability: Decision trees are inherently interpretable. The structure of the tree, with nodes representing decisions and branches representing outcomes, is easy to understand, making it a valuable tool for explaining predictions to non-experts. Simplicity: Effective decision trees are designed to be relatively simple. They aim to capture the essential patterns in the data without unnecessary complexity. Simplicity helps with both understanding and implementation. Versatility:Decision trees can be applied to various types of problems, including classification and regression tasks. They are capable of handling both categorical and numerical data, making them versatile for a wide range of applications.
  • 94. Cont. Handling Missing Values: Effective decision trees can handle datasets with missing values. They have mechanisms for making decisions at nodes even when certain values are missing, ensuring that available information is still utilized. Pruning for Generalization: Effective decision trees often undergo pruning, a process that removes unnecessary branches. Pruning helps prevent overfitting, allowing the model to generalize well to new, unseen data. Scalability: Decision trees are computationally efficient and can handle datasets of varying sizes. This scalability makes them suitable for applications in both small and large data environments. Non-Linearity: Decision trees can model non-linear relationships in the data, allowing them to capture complex patterns and interactions between variables. This is especially beneficial when the relationships are not easily represented by linear models.
  • 95. CONCLUSION In conclusion, decision trees are powerful tools in the realm of data-driven decision-making. Their simplicity, interpretability, and versatility make them valuable in various domains, from business and finance to healthcare and beyond. Effective decision trees exhibit characteristics such as the ability to handle both categorical and numerical data, automatic feature selection, scalability, and the capacity to model non-linear relationships. Their transparency allows decision-makers to understand and trust the decision-making process, fostering confidence in the model's predictions. The feature importance insights provided by decision trees contribute to a deeper understanding of the factors influencing outcomes.
  • 96. CONCLUSION (Cont.) Furthermore, the adaptability of decision trees to handle missing values and their incorporation into ensemble methods like Random Forests enhance their robustness and predictive performance. With proper pruning techniques, decision trees can generalize well to new data, preventing overfitting. Ultimately, decision trees offer a clear and accessible framework for navigating complex decision spaces, making them an indispensable tool for those seeking actionable insights and informed choices from their data.