SlideShare a Scribd company logo
1 of 15
Download to read offline
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
RASHTRASANT TUKDOJI MAHARAJ NAGPUR UNIVERSITY
MBA
SEMESTER: 3
SPECIALIZATION
BUSINESS ANALYTICS (BA 2)
SUBJECT
DATA MINING
MODULE NO : 3
DECISION TREES & DECISION
RULES - Jayanti R Pande
DGICM College, Nagpur
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
Q1. What is Decision Tress? Explain its application.
• DECISION TREES are supervised learning methods used for constructing models from input-output samples.
• They serve the dual purpose of handling both classification and regression tasks.
• These models exhibit a hierarchical structure, forming through recursive splits at decision nodes using test functions.
• Typically, decision trees follow a top-down strategy to search for solutions within the dataset.
• Nodes within decision trees test attributes, often using a univariate approach, assessing a single attribute per node.
• The branches stemming from nodes depict outcomes of attribute tests, resulting in the partitioning of data into subsets.
• For example, in a scenario with attributes X and Y, samples meeting conditions like X > 1 and Y = B might belong to a specific
class.
• Algorithmically, decision trees select attributes to partition samples and create branches, sorting data into respective child
nodes based on attribute values.
• Each path from the root to a leaf node represents a classification rule.
• The choice of attribute at each node significantly impacts the tree's structure and predictive capacity.
APPLICATIONS OF DECISION TREES
• Classification Tasks: Decision trees are extensively used in classification tasks, where they categorize data into distinct classes
or categories based on input features. Applications include customer segmentation in marketing, fraud detection in finance,
and medical diagnosis in healthcare.
• Regression Analysis: They are applied in regression analysis to predict continuous values, making them valuable for
forecasting, pricing strategies, and trend predictions. Used in sales forecasting, risk assessment in finance, and predicting
housing prices in real estate.
• Feature Selection: Decision trees help in identifying and selecting the most relevant and influential features affecting the
target variable. Widely used for feature selection in various machine learning models to improve performance and efficiency.
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
A simple decision tree for classification of samples with two input attributes X and Y is given in Figure below All samples with
feature values X > 1 and Y = B belong to Class2, while the samples with values X < 1 belong to Class1, whatever the value for
feature Y. The samples, at a nonleaf node in the tree structure, are thus partitioned along the branches, and each child node
gets its corresponding subset of samples.
Y = ?
X > 1
CLASS 1 CLASS 2 CLASS 2 CLASS 1
Y = A
Y = B
Y = C
Yes No
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
Q2. How decision tree pruning is done?
DECISION TREE PRUNING is a technique used to simplify decision trees by removing one or more subtrees and replacing them
with leaves. This process aims to reduce complexity, improve comprehensibility, and potentially enhance the predictive
accuracy of the model.
Pruning involves the following methodologies:
• Pre-pruning:
It involves making decisions before splitting nodes based on predefined conditions or statistical tests, such as the χ2 test.
The stopping criterion is used to determine whether to continue splitting a node. If there's no significant improvement in
classification accuracy after a split, the current node is represented as a leaf node, preventing further division.
• Post-pruning:
Post-pruning involves retrospectively removing parts of the tree structure based on selected accuracy criteria after the entire
tree has been constructed.
The decision to prune or remove subtrees is made by assessing the contribution of each subtree to the classification accuracy
of unseen testing samples.
To estimate the predictive error rate accurately, additional techniques like cross-validation or using a separate test dataset are
employed. Cross-validation involves dividing the available samples into blocks, constructing the tree from all except one block,
and testing it on the remaining block. This iterative process helps in assessing the tree's performance and identifying areas
that can be pruned to simplify the model without compromising predictive accuracy on unseen data.
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
Q3. Explain the limitation of Decision Tree and Decision Rules
Decision-rule and decision-tree-based models offer simplicity, speed, and readability, without relying on stringent assumptions
about data distribution or attribute independence, making them generally robust across tasks. However, these methods come
with certain limitations:
1 Not Suitable for Regression:
These models, particularly decision trees, might not perform well in regression tasks. They're prone to overfitting on high-
dimensional datasets, leading to inaccuracies when predicting outcomes on unseen test data.
2 Costly in terms of Computational Resources:
The process of creating decision trees involves significant computational costs, especially since each node necessitates sorting.
Additionally, pruning methods require generating and comparing numerous candidate subtrees, further adding to
computational expenses.
3 Dependency between Samples:
These models assume complete independence among training examples. If any relationship exists between samples, the model
might overvalue those specific instances, leading to biased results. Hence, using matched or repeated measurements in training
data is discouraged.
4 Instability:
Decision trees are sensitive to minor variations in data, potentially resulting in completely different tree structures. Utilizing
decision trees within an ensemble helps counter this instability issue.
5 Greedy Approach and Suboptimal Splits:
The decision-making process in decision trees involves a greedy algorithm for binary tree formation. This approach might result
in suboptimal splits as it selects the best split based on a single criterion, potentially disregarding more informative pathways for
division.
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
Q4. What is ANNs ? Explain it’s capabilities.
ANNs
Artificial Neural Networks (ANNs) are computational models inspired by the human brain's neural structure. They consist of
interconnected nodes (neurons) organized in layers, typically including an input layer, one or more hidden layers, and an
output layer. Each node processes information and transmits signals to nodes in the subsequent layer.
CAPABILITIES OF ANNs
1. Nonlinear Data Representation: ANNs handle complex data patterns due to their high nonlinearity, mimicking real-world
complexities in data generation mechanisms.
2. Learning from Examples: ANNs learn and refine their internal connections by processing sets of training samples, storing
problem-specific knowledge through learned parameters.
3. Adaptability to Changing Environments: These networks can adapt their connections when faced with changing
conditions, facilitating easy retraining to accommodate alterations in their operating environment.
4. Evidential Response in Classification: ANNs not only predict classes but also provide confidence levels for those
predictions, aiding in identifying ambiguous data and enhancing overall classification accuracy.
5. Fault Tolerance and Robustness: ANNs exhibit inherent fault tolerance, maintaining performance even when faced with
neuron disconnections, noisy data, or missing information, though consistency might vary.
6. Consistent Analysis and Design Approach: ANNs utilize a consistent methodology across various domains, offering
standardized principles and notations in their role as information processors.
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
Q5. Explain ANN’s Forward network and Recurrent networks.
The architecture of an Artificial Neural Network (ANN) is shaped by both node characteristics and connectivity parameters
within the network. Nodes possess specific traits, while connectivity determines how nodes are linked in the network. ANNs are
broadly categorized into two architectures: feedforward and recurrent.
Feedforward Networks:
• In a feedforward network, data processing flows strictly from the input side to the output side without any loops or feedback
connections.
• Layers in a feedforward network are structured hierarchically. Nodes within the same layer do not have interconnections, but
outputs from one layer serve as inputs to the subsequent layer.
• This design offers modularity, where nodes within the same layer typically perform identical functions or generate similar
abstractions about input data.
Recurrent Networks:
• Recurrent networks involve feedback loops or connections that form cyclic paths within the network.
• These networks include circular connections, often involving delay elements to synchronize feedback.
• Recurrent networks incorporate feedback mechanisms enabling them to handle sequential data or time-series information
effectively.
While various neural network models exist in both categories, the multilayer feedforward network with backpropagation-
learning is extensively utilized in practical applications due to its effectiveness and applicability across a wide range of problem
domains.
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
Feedforward Networks Recurrent Networks
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
Q6. What is pattern recognition? What types of pattern recognition algorithms are used in machine learning?
Pattern recognition is the process of identifying trends or regularities within a given pattern, either physically,
mathematically, or through algorithms. In machine learning, pattern recognition involves using powerful algorithms to
detect regularities within data. This discipline finds applications in various technological domains such as computer vision,
speech recognition, and face recognition.
TYPES OF PATTERN RECOGNITION ALGORITHMS IN MACHINE LEARNING
1 Supervised Algorithms : These algorithms use a two-stage methodology: model construction and prediction for new or
unseen objects.
Key Features
1. Data partitioning into Training and Test sets.
2. Training the model using algorithms like SVM (Support Vector Machines), decision trees, or random forests.
3. Model training involves learning or recognizing patterns in the data to make predictions.
4. Validation of predictions using the test set.
5. Evaluation of model performance based on correct predictions.
6. The trained model, used for pattern recognition with machine learning, is known as a classifier.
7. Predictions for unseen data are made using this classifier.
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
2 Unsupervised Algorithms : These algorithms do not rely on labeled data or training sets. Instead, they group items based
on similarities in their features without prior information.
Clustering Concept
1. Clustering involves grouping items with similar features.
2. No previous knowledge is available for identifying new items.
3. Machine learning algorithms like hierarchical and k-means clustering are used.
4. New objects are assigned to a group based on their features or properties to make predictions.
These methods offer distinct approaches: supervised algorithms use labeled data for training and testing, while
unsupervised algorithms rely on similarity in features for clustering without prior information or labeled data.
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
Q7. Write a note on C4.5-DR.
• C4.5-DR stands for "C4.5 Decision Rules." It's an extension of the C4.5 algorithm, which is a well-known algorithm for
constructing decision trees. C4.5 builds decision trees from a given set of training data to predict the class labels of instances.
• The C4.5-DR extends the functionality of C4.5 by generating not only decision trees but also decision rules from the learned
trees. These decision rules are derived from the tree structure and provide a more human-understandable representation of
the decision-making process.
• The rules produced by C4.5-DR are essentially a set of if-then conditions based on the attributes or features of the dataset.
These rules aim to capture patterns and relationships present in the data to help classify or predict the target variables. They
are easier to interpret than complex tree structures, making them useful for providing transparent insights into the decision-
making process of the algorithm.
Q8. Note on C4.5 algorithm DT
• The C4.5 algorithm constructs decision trees by recursively choosing the best attribute to split the data based on information
gain ratio.
• It creates a tree by dividing the dataset into subsets, stopping when certain criteria are met, and then prunes the tree to
prevent overfitting.
• C4.5 is used for classification tasks, excelling at handling various data types and generating understandable trees but can be
computationally demanding and prone to overfitting on complex datasets.
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
Q9. CART algorithm and Gini Index.
• CART (Classification and Regression Trees) is a decision tree algorithm used for both classification and regression tasks. It
constructs binary trees by recursively splitting the dataset based on feature thresholds that minimize impurity in the
resulting child nodes.
• Gini Index, a measure of impurity in decision trees, quantifies the probability of incorrectly classifying a randomly chosen
element if it was labeled according to the distribution of labels in the node. It evaluates the homogeneity of the dataset;
lower Gini values imply purer nodes and better separation of classes within a node. CART uses the Gini Index as a criterion
for determining the best split while growing the tree.
Q10. Explain about MLPs.
• MLPs, or Multilayer Perceptrons, are a type of feedforward artificial neural network consisting of multiple layers of nodes or
neurons. They are structured with an input layer, one or more hidden layers, and an output layer. Each node in a layer is
connected to every node in the subsequent layer.
• In an MLP, information moves in a forward direction, passing through the network layer by layer. Neurons in the hidden
layers use activation functions to process inputs and pass their outputs to the next layer. These networks are trained using
supervised learning methods like backpropagation, adjusting the weights and biases to minimize the difference between
predicted and actual outputs.
• MLPs are versatile and capable of learning complex patterns in data, making them widely used in various machine learning
tasks such as classification, regression, and pattern recognition. They have been successfully applied in diverse fields,
including finance, healthcare, natural language processing, and image recognition.
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
Q11. Write a note on Competitive networks and competitive learning
Competitive networks, based on competitive learning, are a type of neural network model used for unsupervised learning
tasks such as clustering and vector quantization. They are composed of nodes that compete against each other to represent
input data most effectively. Kohonen's Self-Organizing Maps (SOM) are a popular example of competitive networks.
In competitive learning, nodes or neurons in the network compete to become activated or respond to specific input patterns.
The main features of competitive learning are:
1.Competition: Neurons in the network compete to become the most active or responsive based on the input data. The neuron
that best matches or represents the input is selected.
2.Cooperation: While neurons compete, they also cooperate to collectively represent the input patterns. Neighboring neurons
may also update their weights to adapt to similar input patterns.
3.Adaptation: Neurons adjust their weights in response to the input data. The winning neuron (the one most responsive to the
input) updates its weights to better represent that input, while its neighbors might make smaller adjustments.
Competitive networks like SOMs create a topological mapping of input data onto a lower-dimensional grid or manifold. They
organize the input space in a way that reveals the inherent structure and relationships among the data points, allowing for
tasks such as clustering, visualization, and data compression.
These networks have applications in various fields, including data visualization, pattern recognition, and exploratory data
analysis, where understanding the underlying structure of complex data is essential.
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
Q12. What are SOMs? Explain their features and applications.
SOMs stand for Self-Organizing Maps, which are a type of artificial neural network used for unsupervised learning and data
visualization. SOMs are a class of competitive learning networks developed by Teuvo Kohonen. They organize high-dimensional
input data into a low-dimensional, usually 2D or 3D, grid or map.
The primary purpose of SOMs is to represent complex, high-dimensional data in a way that reflects the underlying structure and
relationships among the data points. These networks consist of nodes or neurons arranged in a lattice-like structure, where each
neuron is associated with a weight vector of the same dimensionality as the input data.
KEY FEATURES OF SOMS
• Topological Preservation: SOMs maintain the topology of the input space in the map. This means that nearby data points in
the input space will be mapped to neighboring neurons in the SOM.
• Competitive Learning: Neurons compete with each other to become activated based on the similarity between their weight
vectors and the input data. The neuron that best matches or is most responsive to the input becomes the winning neuron.
• Neighbourhood Cooperation: In SOMs, neighboring neurons also adapt their weights, albeit to a lesser extent than the
winning neuron. This cooperative learning helps in preserving the topological structure and smooth mapping of the input
space.
• Dimensionality Reduction: SOMs reduce the dimensionality of the input space while preserving the inherent relationships
among data points. This reduction facilitates visualization and understanding of complex data.
APPLICATIONS of SOMs include data visualization, clustering, pattern recognition, and exploratory data analysis. They are widely
used in various fields such as image analysis, natural language processing, recommendation systems, and data mining to reveal
underlying patterns and structures within datasets.
Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved.
Copyright © 2023 Jayanti Rajdevendra Pande.
All rights reserved.
This content may be printed for personal use only. It may not be copied, distributed, or used for any other purpose
without the express written permission of the copyright owner.
This content is protected by copyright law. Any unauthorized use of the content may violate copyright laws and
other applicable laws.
For any further queries contact on email: jayantipande17@gmail.com
Image credits :
Feedforward networks by kiprono Elijah koech , Published in Towards Data Science
Recurrent Networks by Dinesh on Medium

More Related Content

Similar to Data Mining Module 3 Business Analtics..pdf

IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5ssuser33da69
 
Load_Forecastinglfviuguuyihonrekgdbgr.pptx
Load_Forecastinglfviuguuyihonrekgdbgr.pptxLoad_Forecastinglfviuguuyihonrekgdbgr.pptx
Load_Forecastinglfviuguuyihonrekgdbgr.pptxDEEPAKCHAURASIYA37
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsColleen Farrelly
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.DrezzingGaming
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...IRJET Journal
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...ijsc
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsIJERA Editor
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...IRJET Journal
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionIRJET Journal
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESVikash Kumar
 
Scalable decision tree based on fuzzy partitioning and an incremental approach
Scalable decision tree based on fuzzy partitioning and an  incremental approachScalable decision tree based on fuzzy partitioning and an  incremental approach
Scalable decision tree based on fuzzy partitioning and an incremental approachIJECEIAES
 

Similar to Data Mining Module 3 Business Analtics..pdf (20)

decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
 
Hx3115011506
Hx3115011506Hx3115011506
Hx3115011506
 
Load_Forecastinglfviuguuyihonrekgdbgr.pptx
Load_Forecastinglfviuguuyihonrekgdbgr.pptxLoad_Forecastinglfviuguuyihonrekgdbgr.pptx
Load_Forecastinglfviuguuyihonrekgdbgr.pptx
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.Decision Tree Machine Learning Detailed Explanation.
Decision Tree Machine Learning Detailed Explanation.
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
Utilizing Machine Learning, Detect Chronic Kidney Disease and Suggest A Healt...
 
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...
 
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
AN EFFICIENT PSO BASED ENSEMBLE CLASSIFICATION MODEL ON HIGH DIMENSIONAL DATA...
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 
PNN and inversion-B
PNN and inversion-BPNN and inversion-B
PNN and inversion-B
 
Data discretization
Data discretizationData discretization
Data discretization
 
L016136369
L016136369L016136369
L016136369
 
Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...Cancer data partitioning with data structure and difficulty independent clust...
Cancer data partitioning with data structure and difficulty independent clust...
 
Review of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & PredictionReview of Algorithms for Crime Analysis & Prediction
Review of Algorithms for Crime Analysis & Prediction
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Scalable decision tree based on fuzzy partitioning and an incremental approach
Scalable decision tree based on fuzzy partitioning and an  incremental approachScalable decision tree based on fuzzy partitioning and an  incremental approach
Scalable decision tree based on fuzzy partitioning and an incremental approach
 

More from Jayanti Pande

Web & Social Media Analytics Module 5.pdf
Web & Social Media Analytics Module 5.pdfWeb & Social Media Analytics Module 5.pdf
Web & Social Media Analytics Module 5.pdfJayanti Pande
 
Web & Social Media Analytics Module 4.pdf
Web & Social Media Analytics Module 4.pdfWeb & Social Media Analytics Module 4.pdf
Web & Social Media Analytics Module 4.pdfJayanti Pande
 
Web & Social Media Analytics Module 3.pdf
Web & Social Media Analytics Module 3.pdfWeb & Social Media Analytics Module 3.pdf
Web & Social Media Analytics Module 3.pdfJayanti Pande
 
Web & Social Media Analytics Module 2.pdf
Web & Social Media Analytics Module 2.pdfWeb & Social Media Analytics Module 2.pdf
Web & Social Media Analytics Module 2.pdfJayanti Pande
 
Web & Social Media Analytics Module 1.pdf
Web & Social Media Analytics Module 1.pdfWeb & Social Media Analytics Module 1.pdf
Web & Social Media Analytics Module 1.pdfJayanti Pande
 
Basics of Research| Also Valuable for MBA Research Project Viva.pdf
Basics of Research| Also Valuable for MBA Research Project Viva.pdfBasics of Research| Also Valuable for MBA Research Project Viva.pdf
Basics of Research| Also Valuable for MBA Research Project Viva.pdfJayanti Pande
 
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 5.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 5.pdfPERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 5.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 5.pdfJayanti Pande
 
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2 ] Module 4.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2 ] Module 4.pdfPERFORMANCE MEASUREMENT SYSTEM [HR Paper 2 ] Module 4.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2 ] Module 4.pdfJayanti Pande
 
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 3.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 3.pdfPERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 3.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 3.pdfJayanti Pande
 
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 2.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 2.pdfPERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 2.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 2.pdfJayanti Pande
 
10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdfJayanti Pande
 
MBA Project Report ppt By Jayanti Pande.pdf
MBA Project Report ppt By Jayanti Pande.pdfMBA Project Report ppt By Jayanti Pande.pdf
MBA Project Report ppt By Jayanti Pande.pdfJayanti Pande
 
MBA Project Report | By Jayanti Pande.pdf
MBA Project Report |  By Jayanti Pande.pdfMBA Project Report |  By Jayanti Pande.pdf
MBA Project Report | By Jayanti Pande.pdfJayanti Pande
 
HR Paper 2 Module 1 INTRODUCTION TO PERFORMANCE MEASUREMENT .pdf
HR Paper 2 Module 1 INTRODUCTION TO PERFORMANCE MEASUREMENT .pdfHR Paper 2 Module 1 INTRODUCTION TO PERFORMANCE MEASUREMENT .pdf
HR Paper 2 Module 1 INTRODUCTION TO PERFORMANCE MEASUREMENT .pdfJayanti Pande
 
Data Mining Module 5 Business Analytics.pdf
Data Mining Module 5 Business Analytics.pdfData Mining Module 5 Business Analytics.pdf
Data Mining Module 5 Business Analytics.pdfJayanti Pande
 
Data Mining Module 4 Business Analytics.pdf
Data Mining Module 4 Business Analytics.pdfData Mining Module 4 Business Analytics.pdf
Data Mining Module 4 Business Analytics.pdfJayanti Pande
 
Data Mining Module 1 Business Analytics.
Data Mining Module 1 Business Analytics.Data Mining Module 1 Business Analytics.
Data Mining Module 1 Business Analytics.Jayanti Pande
 
Business Analytics 1 Module 5.pdf
Business Analytics 1 Module 5.pdfBusiness Analytics 1 Module 5.pdf
Business Analytics 1 Module 5.pdfJayanti Pande
 
Business Analytics 1 Module 4.pdf
Business Analytics 1 Module 4.pdfBusiness Analytics 1 Module 4.pdf
Business Analytics 1 Module 4.pdfJayanti Pande
 
Business Analytics 1 Module 3.pdf
Business Analytics 1 Module 3.pdfBusiness Analytics 1 Module 3.pdf
Business Analytics 1 Module 3.pdfJayanti Pande
 

More from Jayanti Pande (20)

Web & Social Media Analytics Module 5.pdf
Web & Social Media Analytics Module 5.pdfWeb & Social Media Analytics Module 5.pdf
Web & Social Media Analytics Module 5.pdf
 
Web & Social Media Analytics Module 4.pdf
Web & Social Media Analytics Module 4.pdfWeb & Social Media Analytics Module 4.pdf
Web & Social Media Analytics Module 4.pdf
 
Web & Social Media Analytics Module 3.pdf
Web & Social Media Analytics Module 3.pdfWeb & Social Media Analytics Module 3.pdf
Web & Social Media Analytics Module 3.pdf
 
Web & Social Media Analytics Module 2.pdf
Web & Social Media Analytics Module 2.pdfWeb & Social Media Analytics Module 2.pdf
Web & Social Media Analytics Module 2.pdf
 
Web & Social Media Analytics Module 1.pdf
Web & Social Media Analytics Module 1.pdfWeb & Social Media Analytics Module 1.pdf
Web & Social Media Analytics Module 1.pdf
 
Basics of Research| Also Valuable for MBA Research Project Viva.pdf
Basics of Research| Also Valuable for MBA Research Project Viva.pdfBasics of Research| Also Valuable for MBA Research Project Viva.pdf
Basics of Research| Also Valuable for MBA Research Project Viva.pdf
 
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 5.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 5.pdfPERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 5.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 5.pdf
 
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2 ] Module 4.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2 ] Module 4.pdfPERFORMANCE MEASUREMENT SYSTEM [HR Paper 2 ] Module 4.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2 ] Module 4.pdf
 
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 3.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 3.pdfPERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 3.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 3.pdf
 
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 2.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 2.pdfPERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 2.pdf
PERFORMANCE MEASUREMENT SYSTEM [HR Paper 2] Module 2.pdf
 
10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf
 
MBA Project Report ppt By Jayanti Pande.pdf
MBA Project Report ppt By Jayanti Pande.pdfMBA Project Report ppt By Jayanti Pande.pdf
MBA Project Report ppt By Jayanti Pande.pdf
 
MBA Project Report | By Jayanti Pande.pdf
MBA Project Report |  By Jayanti Pande.pdfMBA Project Report |  By Jayanti Pande.pdf
MBA Project Report | By Jayanti Pande.pdf
 
HR Paper 2 Module 1 INTRODUCTION TO PERFORMANCE MEASUREMENT .pdf
HR Paper 2 Module 1 INTRODUCTION TO PERFORMANCE MEASUREMENT .pdfHR Paper 2 Module 1 INTRODUCTION TO PERFORMANCE MEASUREMENT .pdf
HR Paper 2 Module 1 INTRODUCTION TO PERFORMANCE MEASUREMENT .pdf
 
Data Mining Module 5 Business Analytics.pdf
Data Mining Module 5 Business Analytics.pdfData Mining Module 5 Business Analytics.pdf
Data Mining Module 5 Business Analytics.pdf
 
Data Mining Module 4 Business Analytics.pdf
Data Mining Module 4 Business Analytics.pdfData Mining Module 4 Business Analytics.pdf
Data Mining Module 4 Business Analytics.pdf
 
Data Mining Module 1 Business Analytics.
Data Mining Module 1 Business Analytics.Data Mining Module 1 Business Analytics.
Data Mining Module 1 Business Analytics.
 
Business Analytics 1 Module 5.pdf
Business Analytics 1 Module 5.pdfBusiness Analytics 1 Module 5.pdf
Business Analytics 1 Module 5.pdf
 
Business Analytics 1 Module 4.pdf
Business Analytics 1 Module 4.pdfBusiness Analytics 1 Module 4.pdf
Business Analytics 1 Module 4.pdf
 
Business Analytics 1 Module 3.pdf
Business Analytics 1 Module 3.pdfBusiness Analytics 1 Module 3.pdf
Business Analytics 1 Module 3.pdf
 

Recently uploaded

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 

Recently uploaded (20)

Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 

Data Mining Module 3 Business Analtics..pdf

  • 1. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. RASHTRASANT TUKDOJI MAHARAJ NAGPUR UNIVERSITY MBA SEMESTER: 3 SPECIALIZATION BUSINESS ANALYTICS (BA 2) SUBJECT DATA MINING MODULE NO : 3 DECISION TREES & DECISION RULES - Jayanti R Pande DGICM College, Nagpur
  • 2. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. Q1. What is Decision Tress? Explain its application. • DECISION TREES are supervised learning methods used for constructing models from input-output samples. • They serve the dual purpose of handling both classification and regression tasks. • These models exhibit a hierarchical structure, forming through recursive splits at decision nodes using test functions. • Typically, decision trees follow a top-down strategy to search for solutions within the dataset. • Nodes within decision trees test attributes, often using a univariate approach, assessing a single attribute per node. • The branches stemming from nodes depict outcomes of attribute tests, resulting in the partitioning of data into subsets. • For example, in a scenario with attributes X and Y, samples meeting conditions like X > 1 and Y = B might belong to a specific class. • Algorithmically, decision trees select attributes to partition samples and create branches, sorting data into respective child nodes based on attribute values. • Each path from the root to a leaf node represents a classification rule. • The choice of attribute at each node significantly impacts the tree's structure and predictive capacity. APPLICATIONS OF DECISION TREES • Classification Tasks: Decision trees are extensively used in classification tasks, where they categorize data into distinct classes or categories based on input features. Applications include customer segmentation in marketing, fraud detection in finance, and medical diagnosis in healthcare. • Regression Analysis: They are applied in regression analysis to predict continuous values, making them valuable for forecasting, pricing strategies, and trend predictions. Used in sales forecasting, risk assessment in finance, and predicting housing prices in real estate. • Feature Selection: Decision trees help in identifying and selecting the most relevant and influential features affecting the target variable. Widely used for feature selection in various machine learning models to improve performance and efficiency.
  • 3. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. A simple decision tree for classification of samples with two input attributes X and Y is given in Figure below All samples with feature values X > 1 and Y = B belong to Class2, while the samples with values X < 1 belong to Class1, whatever the value for feature Y. The samples, at a nonleaf node in the tree structure, are thus partitioned along the branches, and each child node gets its corresponding subset of samples. Y = ? X > 1 CLASS 1 CLASS 2 CLASS 2 CLASS 1 Y = A Y = B Y = C Yes No
  • 4. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. Q2. How decision tree pruning is done? DECISION TREE PRUNING is a technique used to simplify decision trees by removing one or more subtrees and replacing them with leaves. This process aims to reduce complexity, improve comprehensibility, and potentially enhance the predictive accuracy of the model. Pruning involves the following methodologies: • Pre-pruning: It involves making decisions before splitting nodes based on predefined conditions or statistical tests, such as the χ2 test. The stopping criterion is used to determine whether to continue splitting a node. If there's no significant improvement in classification accuracy after a split, the current node is represented as a leaf node, preventing further division. • Post-pruning: Post-pruning involves retrospectively removing parts of the tree structure based on selected accuracy criteria after the entire tree has been constructed. The decision to prune or remove subtrees is made by assessing the contribution of each subtree to the classification accuracy of unseen testing samples. To estimate the predictive error rate accurately, additional techniques like cross-validation or using a separate test dataset are employed. Cross-validation involves dividing the available samples into blocks, constructing the tree from all except one block, and testing it on the remaining block. This iterative process helps in assessing the tree's performance and identifying areas that can be pruned to simplify the model without compromising predictive accuracy on unseen data.
  • 5. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. Q3. Explain the limitation of Decision Tree and Decision Rules Decision-rule and decision-tree-based models offer simplicity, speed, and readability, without relying on stringent assumptions about data distribution or attribute independence, making them generally robust across tasks. However, these methods come with certain limitations: 1 Not Suitable for Regression: These models, particularly decision trees, might not perform well in regression tasks. They're prone to overfitting on high- dimensional datasets, leading to inaccuracies when predicting outcomes on unseen test data. 2 Costly in terms of Computational Resources: The process of creating decision trees involves significant computational costs, especially since each node necessitates sorting. Additionally, pruning methods require generating and comparing numerous candidate subtrees, further adding to computational expenses. 3 Dependency between Samples: These models assume complete independence among training examples. If any relationship exists between samples, the model might overvalue those specific instances, leading to biased results. Hence, using matched or repeated measurements in training data is discouraged. 4 Instability: Decision trees are sensitive to minor variations in data, potentially resulting in completely different tree structures. Utilizing decision trees within an ensemble helps counter this instability issue. 5 Greedy Approach and Suboptimal Splits: The decision-making process in decision trees involves a greedy algorithm for binary tree formation. This approach might result in suboptimal splits as it selects the best split based on a single criterion, potentially disregarding more informative pathways for division.
  • 6. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. Q4. What is ANNs ? Explain it’s capabilities. ANNs Artificial Neural Networks (ANNs) are computational models inspired by the human brain's neural structure. They consist of interconnected nodes (neurons) organized in layers, typically including an input layer, one or more hidden layers, and an output layer. Each node processes information and transmits signals to nodes in the subsequent layer. CAPABILITIES OF ANNs 1. Nonlinear Data Representation: ANNs handle complex data patterns due to their high nonlinearity, mimicking real-world complexities in data generation mechanisms. 2. Learning from Examples: ANNs learn and refine their internal connections by processing sets of training samples, storing problem-specific knowledge through learned parameters. 3. Adaptability to Changing Environments: These networks can adapt their connections when faced with changing conditions, facilitating easy retraining to accommodate alterations in their operating environment. 4. Evidential Response in Classification: ANNs not only predict classes but also provide confidence levels for those predictions, aiding in identifying ambiguous data and enhancing overall classification accuracy. 5. Fault Tolerance and Robustness: ANNs exhibit inherent fault tolerance, maintaining performance even when faced with neuron disconnections, noisy data, or missing information, though consistency might vary. 6. Consistent Analysis and Design Approach: ANNs utilize a consistent methodology across various domains, offering standardized principles and notations in their role as information processors.
  • 7. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. Q5. Explain ANN’s Forward network and Recurrent networks. The architecture of an Artificial Neural Network (ANN) is shaped by both node characteristics and connectivity parameters within the network. Nodes possess specific traits, while connectivity determines how nodes are linked in the network. ANNs are broadly categorized into two architectures: feedforward and recurrent. Feedforward Networks: • In a feedforward network, data processing flows strictly from the input side to the output side without any loops or feedback connections. • Layers in a feedforward network are structured hierarchically. Nodes within the same layer do not have interconnections, but outputs from one layer serve as inputs to the subsequent layer. • This design offers modularity, where nodes within the same layer typically perform identical functions or generate similar abstractions about input data. Recurrent Networks: • Recurrent networks involve feedback loops or connections that form cyclic paths within the network. • These networks include circular connections, often involving delay elements to synchronize feedback. • Recurrent networks incorporate feedback mechanisms enabling them to handle sequential data or time-series information effectively. While various neural network models exist in both categories, the multilayer feedforward network with backpropagation- learning is extensively utilized in practical applications due to its effectiveness and applicability across a wide range of problem domains.
  • 8. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. Feedforward Networks Recurrent Networks
  • 9. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. Q6. What is pattern recognition? What types of pattern recognition algorithms are used in machine learning? Pattern recognition is the process of identifying trends or regularities within a given pattern, either physically, mathematically, or through algorithms. In machine learning, pattern recognition involves using powerful algorithms to detect regularities within data. This discipline finds applications in various technological domains such as computer vision, speech recognition, and face recognition. TYPES OF PATTERN RECOGNITION ALGORITHMS IN MACHINE LEARNING 1 Supervised Algorithms : These algorithms use a two-stage methodology: model construction and prediction for new or unseen objects. Key Features 1. Data partitioning into Training and Test sets. 2. Training the model using algorithms like SVM (Support Vector Machines), decision trees, or random forests. 3. Model training involves learning or recognizing patterns in the data to make predictions. 4. Validation of predictions using the test set. 5. Evaluation of model performance based on correct predictions. 6. The trained model, used for pattern recognition with machine learning, is known as a classifier. 7. Predictions for unseen data are made using this classifier.
  • 10. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. 2 Unsupervised Algorithms : These algorithms do not rely on labeled data or training sets. Instead, they group items based on similarities in their features without prior information. Clustering Concept 1. Clustering involves grouping items with similar features. 2. No previous knowledge is available for identifying new items. 3. Machine learning algorithms like hierarchical and k-means clustering are used. 4. New objects are assigned to a group based on their features or properties to make predictions. These methods offer distinct approaches: supervised algorithms use labeled data for training and testing, while unsupervised algorithms rely on similarity in features for clustering without prior information or labeled data.
  • 11. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. Q7. Write a note on C4.5-DR. • C4.5-DR stands for "C4.5 Decision Rules." It's an extension of the C4.5 algorithm, which is a well-known algorithm for constructing decision trees. C4.5 builds decision trees from a given set of training data to predict the class labels of instances. • The C4.5-DR extends the functionality of C4.5 by generating not only decision trees but also decision rules from the learned trees. These decision rules are derived from the tree structure and provide a more human-understandable representation of the decision-making process. • The rules produced by C4.5-DR are essentially a set of if-then conditions based on the attributes or features of the dataset. These rules aim to capture patterns and relationships present in the data to help classify or predict the target variables. They are easier to interpret than complex tree structures, making them useful for providing transparent insights into the decision- making process of the algorithm. Q8. Note on C4.5 algorithm DT • The C4.5 algorithm constructs decision trees by recursively choosing the best attribute to split the data based on information gain ratio. • It creates a tree by dividing the dataset into subsets, stopping when certain criteria are met, and then prunes the tree to prevent overfitting. • C4.5 is used for classification tasks, excelling at handling various data types and generating understandable trees but can be computationally demanding and prone to overfitting on complex datasets.
  • 12. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. Q9. CART algorithm and Gini Index. • CART (Classification and Regression Trees) is a decision tree algorithm used for both classification and regression tasks. It constructs binary trees by recursively splitting the dataset based on feature thresholds that minimize impurity in the resulting child nodes. • Gini Index, a measure of impurity in decision trees, quantifies the probability of incorrectly classifying a randomly chosen element if it was labeled according to the distribution of labels in the node. It evaluates the homogeneity of the dataset; lower Gini values imply purer nodes and better separation of classes within a node. CART uses the Gini Index as a criterion for determining the best split while growing the tree. Q10. Explain about MLPs. • MLPs, or Multilayer Perceptrons, are a type of feedforward artificial neural network consisting of multiple layers of nodes or neurons. They are structured with an input layer, one or more hidden layers, and an output layer. Each node in a layer is connected to every node in the subsequent layer. • In an MLP, information moves in a forward direction, passing through the network layer by layer. Neurons in the hidden layers use activation functions to process inputs and pass their outputs to the next layer. These networks are trained using supervised learning methods like backpropagation, adjusting the weights and biases to minimize the difference between predicted and actual outputs. • MLPs are versatile and capable of learning complex patterns in data, making them widely used in various machine learning tasks such as classification, regression, and pattern recognition. They have been successfully applied in diverse fields, including finance, healthcare, natural language processing, and image recognition.
  • 13. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. Q11. Write a note on Competitive networks and competitive learning Competitive networks, based on competitive learning, are a type of neural network model used for unsupervised learning tasks such as clustering and vector quantization. They are composed of nodes that compete against each other to represent input data most effectively. Kohonen's Self-Organizing Maps (SOM) are a popular example of competitive networks. In competitive learning, nodes or neurons in the network compete to become activated or respond to specific input patterns. The main features of competitive learning are: 1.Competition: Neurons in the network compete to become the most active or responsive based on the input data. The neuron that best matches or represents the input is selected. 2.Cooperation: While neurons compete, they also cooperate to collectively represent the input patterns. Neighboring neurons may also update their weights to adapt to similar input patterns. 3.Adaptation: Neurons adjust their weights in response to the input data. The winning neuron (the one most responsive to the input) updates its weights to better represent that input, while its neighbors might make smaller adjustments. Competitive networks like SOMs create a topological mapping of input data onto a lower-dimensional grid or manifold. They organize the input space in a way that reveals the inherent structure and relationships among the data points, allowing for tasks such as clustering, visualization, and data compression. These networks have applications in various fields, including data visualization, pattern recognition, and exploratory data analysis, where understanding the underlying structure of complex data is essential.
  • 14. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. Q12. What are SOMs? Explain their features and applications. SOMs stand for Self-Organizing Maps, which are a type of artificial neural network used for unsupervised learning and data visualization. SOMs are a class of competitive learning networks developed by Teuvo Kohonen. They organize high-dimensional input data into a low-dimensional, usually 2D or 3D, grid or map. The primary purpose of SOMs is to represent complex, high-dimensional data in a way that reflects the underlying structure and relationships among the data points. These networks consist of nodes or neurons arranged in a lattice-like structure, where each neuron is associated with a weight vector of the same dimensionality as the input data. KEY FEATURES OF SOMS • Topological Preservation: SOMs maintain the topology of the input space in the map. This means that nearby data points in the input space will be mapped to neighboring neurons in the SOM. • Competitive Learning: Neurons compete with each other to become activated based on the similarity between their weight vectors and the input data. The neuron that best matches or is most responsive to the input becomes the winning neuron. • Neighbourhood Cooperation: In SOMs, neighboring neurons also adapt their weights, albeit to a lesser extent than the winning neuron. This cooperative learning helps in preserving the topological structure and smooth mapping of the input space. • Dimensionality Reduction: SOMs reduce the dimensionality of the input space while preserving the inherent relationships among data points. This reduction facilitates visualization and understanding of complex data. APPLICATIONS of SOMs include data visualization, clustering, pattern recognition, and exploratory data analysis. They are widely used in various fields such as image analysis, natural language processing, recommendation systems, and data mining to reveal underlying patterns and structures within datasets.
  • 15. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. Copyright © 2023 Jayanti Rajdevendra Pande. All rights reserved. This content may be printed for personal use only. It may not be copied, distributed, or used for any other purpose without the express written permission of the copyright owner. This content is protected by copyright law. Any unauthorized use of the content may violate copyright laws and other applicable laws. For any further queries contact on email: jayantipande17@gmail.com Image credits : Feedforward networks by kiprono Elijah koech , Published in Towards Data Science Recurrent Networks by Dinesh on Medium