SlideShare a Scribd company logo
1 of 133
Machine Learning
1
Department of Computer Science &
Engineering
Course Name & Course Code
Presentation Material
Department of Computer Science & Engineering
Course Code: Semester: V
Course Title: Machine Learning Year: 2022
Faculty Name: Prof. Arunkumar, Dr. Tina Babu , Dr. Revathi V, Dr. Geetha, Prof. Ranjini
MODULE 3
Syllabus – Supervised Learning
Introduction to Supervised Learning, Introduction to Perceptron
model and its adaptive learning algorithms (gradient Decent and
Stochastic Gradient Decent), Introduction to classification, Naive
Bayes classification Binary and multi class Classification, decision
trees and random forest, Regression (methods of function
estimation) --Linear regression and Non-linear regression, logistic
regression, Introduction To Kernel Based Methods of machine
learning: K-Nearest neighbourhood, kernel functions, SVM,
Introduction to ensemble based
learning methods
2
Department of Computer Science &
Engineering
Course Name & Course Code
Introduction to Supervised Learning
• Machines are trained using well "labelled" training
data, and on basis of that data, machines predict
the output.
– The labelled data means some input data is already
tagged with the correct output.
• The training data provided to the machines work
as the supervisor that teaches the machines to
predict the output correctly.
• Supervised learning is a process of providing input
data as well as correct output data to the machine
learning model. The aim of a supervised learning
algorithm is to find a mapping function to map
the input variable(x) with the output variable(y).
Introduction to Supervised Learning
Introduction to Supervised Learning
Introduction to Supervised Learning
Introduction to Supervised Learning
1. Regression
• Used if there is a relationship between the input
variable and the output variable.
• It is used for the prediction of continuous variables,
such as Weather forecasting, Market Trends, etc.
Introduction to Supervised Learning
2. Classification
• Used when the output variable is categorical, which
means there are two classes such as Yes-No, Male-
Female, True-false, etc.
Introduction to Supervised Learning
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
• What is perceptron?
• Neural Network In 5 Minutes | What Is A Neural Network? | How Neural Networks Work |
Simplilearn – YouTube
• Perceptron is a building block of an Artificial Neural Network.
• Perceptron is a linear Machine Learning algorithm
used for supervised learning for various binary
classifiers.
• This algorithm enables neurons to learn elements
and processes them one by one during preparation.
Introduction to Perceptron Model
• What is the Perceptron model in Machine
Learning?
• Perceptron is also understood as an Artificial Neuron or
neural network unit that helps to detect certain input
data computations in business intelligence.
• Perceptron model is also treated as one of the
best and simplest types of Artificial Neural
networks. However, it is a supervised learning
algorithm of binary classifiers.
Introduction to Perceptron Model
Introduction to Perceptron Model
• Basic Components of Perceptron
• it as a single-layer neural network with four
main parameters
– input values,
– weights and Bias,
– net sum,
– an activation function.
Introduction to Perceptron Model
Introduction to Perceptron Model
• Perceptron thus has the following three basic
elements
Introduction to Perceptron Model
Introduction to Perceptron Model
• Why do we Need Weight and Bias?
• Network gets trained it adjusts both parameters to
achieve the desired values and the correct output.
• Weights - Weights are used to measure the
importance of each feature in predicting output
value.
• Features with values close to zero are said to have
lesser weight or significance. These have less
importance in the prediction process compared to
the features with values further from zero known as
weights with a larger value.
• Besides high-weighted features having greater
predictive power than low-weighting ones, the weight
can also be positive or negative.
Introduction to Perceptron Model
• Why do we Need Weight and Bias?
• Bias - bias delays the trigger of the
activation function. It acts like an intercept
in a linear equation.
• Bias is a constant used to adjust the output
and help the model to provide the best fit
output for the given data.
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
• Learning Rate – It’s a positive constant that is
used to moderate the degree to which weights
are changed at each step.
• What is Perceptron: A Beginners Guide for
Perceptron [Updated] (simplilearn.com)
Introduction to Perceptron Model
Introduction to Perceptron Model
• Algorithm
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
• Example 1 - 2 AND GATE Perceptron Training
Rule | Artificial Neural Networks Machine
Learning by Mahesh Huddar – YouTube
• Example 2 - 3. OR GATE Perceptron Training
Rule | Artificial Neural Networks Machine
Learning by Mahesh Huddar - YouTube
• Example 3 - Perceptron Rule to design XOR
Logic Gate Solved Example ANN Machine
Learning by Mahesh Huddar - YouTube
Introduction to Perceptron Model
• 1. Gradient Descent | Delta Rule | Delta Rule
Derivation Nonlinearly Separable Data by
Mahesh Huddar - YouTube
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
Introduction to Perceptron Model
• Limitations of Gradient Descent
1) converging to a local minimum can sometimes
be quite slow (i.e., it can require many
thousands of gradient descent steps)
2) if there are multiple local minima in the error
surface, then there is no guarantee that the
procedure will find the global minimum.
Introduction to Perceptron Model
• Stochastic Gradient Descent
• Incremental Gradient Descent
– approximate this gradient descent search by
updating weights incrementally, following the
calculation of the error for each individual
example.
Introduction to Perceptron Model
Introduction to Perceptron Model
• One way to view this stochastic gradient descent
is to consider a distinct error function
defined for each individual training
example d as follows
• Stochastic gradient descent iterates over the
training examples d in D, at each iteration altering
the weights according to the gradient with
respect to
Introduction to Perceptron Model
• The key differences between standard
gradient descent and stochastic gradient
descent are:
Introduction to Perceptron Model
V Sem – Machine Learning Department of Computer Science & Engineering
Supervised Learning: Popular
Supervised Algorithms
V Sem – Machine Learning Department of Computer Science & Engineering
Machine Learning: Glimpse
V Sem – Machine Learning Department of Computer Science & Engineering
Introduction to classification
52
V Sem – Machine Learning Department of Computer Science & Engineering
Classification in Machine Learning
• Classification is a supervised machine learning method where the model
tries to predict the correct label of a given input data. In classification,
the model is fully trained using the training data, and then it is evaluated
on test data before being used to perform prediction on new unseen
data.
• For instance, an algorithm can learn to predict whether a given email is
spam or ham (no spam), as illustrated below.
Lazy Learners Vs. Eager Learners
• Eager learners are machine learning algorithms that first build a
model from the training dataset before making any prediction on
future datasets. They spend more time during the training process
because of their eagerness to have a better generalization during
the training from learning the weights, but they require less time
to make predictions.
• Most machine learning algorithms are eager learners, and below
are some examples:
• Logistic Regression.
• Support Vector Machine.
• Decision Trees.
• Artificial Neural Networks.
Lazy Learners Vs. Eager Learners
• Lazy learners or instance-based learners, on the other hand, do not
create any model immediately from the training data, and this is where
the lazy aspect comes from.
• They just memorize the training data, and each time there is a need to
make a prediction, they search for the nearest neighbor from the whole
training data, which makes them very slow during prediction. Some
examples of this kind are:
• K-Nearest Neighbor.
• Case-based reasoning.
Machine Learning Classification Vs.
Regression
Machine Learning Classification in
Real Life
Healthcare
Training a machine learning model on historical patient data can help healthcare
specialists accurately analyze their diagnoses:
• During the COVID-19 pandemic, machine learning models were implemented to
efficiently predict whether a person had COVID-19 or not.
Education
• Education is one of the domains dealing with the most textual, video, and audio
data. This unstructured information can be analyzed with the help of Natural
Language technologies to perform different tasks such as:
• The classification of documents per category.
Sustainable agriculture
• Agriculture is one of the most valuable pillars of human survival. Introducing
sustainability can help improve farmers' productivity at a different level without
damaging the environment:
• By using classification models to predict which type of land is suitable for a given
type of seed.
Different Types of Classification
Binary Classification
The goal is to classify the input data into two mutually exclusive categories. The training
data in such a situation is labeled in a binary format: true and false; positive and
negative; O and 1; spam and not spam, etc. depending on the problem being tackled.
For instance, we might want to detect whether a given image is a truck or a boat.
Logistic Regression and Support Vector Machines algorithms are natively designed for
binary classifications. However, other algorithms such as K-Nearest Neighbors and
Decision Trees can also be used for binary classification.
Multi-Class Classification
The multi-class classification, on the other hand, has at least two mutually exclusive
class labels, where the goal is to predict to which class a given input example
belongs to. In the following case, the model correctly classified the image to be a
plane.
• Most of the binary classification algorithms can be also used for multi-class
classification. These algorithms include but are not limited to:
• Random Forest
• Naive Bayes
• K-Nearest Neighbors
• Gradient Boosting
• SVM
• Logistic Regression.
Multi-Class Classification
• Didn’t you say that SVM and Logistic Regression do not support multi-class
classification by default?
• → That’s correct. However, we can apply binary transformation approaches such as
one-versus-one and one-versus-all to adapt native binary classification algorithms for
multi-class classification tasks.
• One-versus-one: this strategy trains as many classifiers as there are pairs of labels. If
we have a 3-class classification, we will have three pairs of labels, thus three
classifiers, as shown below.
• For N labels, we will have Nx(N-1)/2 classifiers. Each classifier is trained on a single
binary dataset, and the final class is predicted by a majority vote between all the
classifiers. One-vs-one approach works best for SVM and other kernel-based
algorithms.
Multi-Class Classification
• One-versus-rest: at this stage, we start by considering each label as an
independent label and consider the rest combined as only one label. With
3-classes, we will have three classifiers.
• In general, for N labels, we will have N binary classifiers.
Multi-Label Classification
• In multi-label classification tasks, we try to predict 0 or more classes for
each input example. In this case, there is no mutual exclusion because
the input example can have more than one label.
• Such a scenario can be observed in different domains, such as auto-
tagging in Natural Language Processing, where a given text can contain
multiple topics. Similarly to computer vision, an image can contain
multiple objects,
Multi-Label Classification
• It is not possible to use multi-class or binary classification
models to perform multi-label classification. However, most
algorithms used for those standard classification tasks have
their specialized versions for multi-label classification. We
can cite:
• Multi-label Decision Trees
• Multi-label Gradient Boosting
• Multi-label Random Forests
Imbalanced Classification
• For the imbalanced classification, the number of examples is unevenly
distributed in each class, meaning that we can have more of one class
than the others in the training data. Let’s consider the following 3-class
classification scenario where the training data contains: 60% of trucks,
25% of planes, and 15% of boats.
Imbalanced Classification
• The imbalanced classification problem could occur in the following
scenario:
• Fraudulent transaction detections in financial industries
• Rare disease diagnosis
• Customer churn analysis
• Using conventional predictive models such as Decision Trees,
Logistic Regression, etc. could not be effective when dealing with
an imbalanced dataset, because they might be biased toward
predicting the class with the highest number of observations, and
considering those with fewer numbers as noise.
• So, does that mean that such problems are left behind?
• Of course not! We can use multiple approaches to tackle the
imbalance problem in a dataset. The most commonly used
approaches include sampling techniques or harnessing the power
of cost-sensitive algorithms.
Classification: Meaning
• Process of arranging data into
homogeneous (similar)
groups according to their
common characteristics.
• Raw data cannot be easily
understood, and it is not fit
for further analysis and
interpretation. Arrangement
of data helps users in
comparison and analysis.
• For example,
– the population of a town can
be grouped according to sex,
age, marital status, etc.
• “Classification is the
process of arranging
data into sequences
according to their
common
characteristics or
separating them into
different related
parts.”
– Prof. Secrist
V Sem – Machine Learning Department of Computer Science & Engineering
Classification of Data
• The method of arranging data into “homogeneous classes”
according to the common features present in the data is known as
classification.
• A planned data analysis system makes the fundamental data easy
to find and recover.
– This can be of particular interest for legal discovery, risk
management, and compliance.
– Written methods and sets of guidelines for data classification should
determine what levels and measures the company will use to
organise data and define the roles of employees within the business
regarding input stewardship.
– Once a data -classification scheme has been designed, the security
standards that stipulate proper approaching practices for each
division and the storage criteria that determines the data’s lifecycle
demands should be discussed.
V Sem – Machine Learning Department of Computer Science & Engineering
Classification of Data: Objectives
• To consolidate the volume of data in such a way that
similarities and differences can be quickly understood.
Figures can consequently be ordered in sections with
common traits.
• To aid comparison.
• To point out the important characteristics of the data
at a flash.
• To give importance to the prominent data collected
while separating the optional elements.
• To allow a statistical method of the materials
gathered.
V Sem – Machine Learning Department of Computer Science & Engineering
Introduction to Classification
V Sem – Machine Learning Department of Computer Science & Engineering
Naïve Bayes Classifier
Naïve Bayes Classifier
Naïve Bayes Classifier
Naïve Bayes Classifier
Naïve Bayes Classifier
Naïve Bayes Classifier
Naïve Bayes Classifier
Naïve Bayes Classifier
Naïve Bayes Classifier
Naïve Bayes Classifier
Example 1: 1. Solved Example Naive Bayes Classifier
to classify New Instance PlayTennis Example
Mahesh Huddar – YouTube
Example 2: 2. Solved Example Naive Bayes Classifier
to classify New Instance | Species Example by
Mahesh Huddar – YouTube
Example 3: 3. Solved Example Naive Bayes Classifier
to classify New Instance Car Example by Mahesh
Huddar - YouTube
Bayes theorem in Multi class classification
• The exact Bayesian classification is technically impractical since we have
many evidence variables (predictors) in our dataset. When the number
of predictors increases, many records that we want to classify will not
have an exact match.
• The above equation shows only the case where we have 3 evidence
variables and even with only 3 of them it is not easy to find an exact
match.
Bayes theorem in Multi class classification
• The naive assumption introduces that the variables are independent given the
class. So we can calculate the conditional probability as follows:
• By assuming the conditional independence between variables we can convert the
Bayes equation into a simpler and naive one. Even though assuming independence
between variables sounds superficial, the Naive Bayes algorithm performs pretty
well in many classification tasks.
• For more detail: https://www.geeksforgeeks.org/naive-bayes-classifiers/
Example for multiclass
Decision trees
❑ Decision Tree is the most powerful and popular tool for classification
and prediction. A Decision tree is a flowchart-like tree structure, where
each internal node denotes a test on an attribute, each branch
represents an outcome of the test, and each leaf node (terminal node)
holds a class label.
https://youtu.be/RmajweUFKvM
Types of Decision Trees
Important Terminology related
to Decision Trees
1. Root Node: It represents the entire population or sample and
this further gets divided into two or more homogeneous sets.
2. Splitting: It is a process of dividing a node into two or more
sub-nodes.
3. Decision Node: When a sub-node splits into further sub-nodes,
then it is called the decision node.
4. Leaf / Terminal Node: Nodes do not split is called Leaf or
Terminal node.
5. Pruning: When we remove sub-nodes of a decision node, this
process is called pruning. You can say the opposite process of
splitting.
6. Branch / Sub-Tree: A subsection of the entire tree is called
branch or sub-tree.
7. Parent and Child Node: A node, which is divided into sub-
nodes is called a parent node of sub-nodes whereas sub-nodes
are the child of a parent node.
Decision Tree
Assumptions while creating
Decision Tree
• In the beginning, the whole training set is
considered as the root.
• Feature values are preferred to be categorical. If
the values are continuous then they are
discretized prior to building the model.
• Records are distributed recursively on the basis
of attribute values.
• Order to placing attributes as root or internal
node of the tree is done by using some statistical
approach.
Decision trees expressivity
• Decision Trees follow Sum of Product (SOP)
representation. The Sum of product (SOP) is also
known as Disjunctive Normal Form. For a class,
every branch from the root of the tree to a leaf
node having the same class is conjunction
(product) of values, different branches ending in
that class form a disjunction (sum).
A Decision Tree for the concept PlayTennis
• This tree classifies Saturday mornings according to
whether or not they are suitable for playing tennis.
Decision trees expressivity
• Decision trees represent a disjunction of
conjunctions on constraints on the value of
attributes:
Training Dataset
How do Decision Trees work?
• Decision trees use multiple algorithms to decide
to split a node into two or more sub-nodes. The
creation of sub-nodes increases the
homogeneity of resultant sub-nodes. In other
words, we can say that the purity of the node
increases with respect to the target variable.
Algorithms used in Decision
Trees:
• ID3 → (extension of D3)
C4.5 → (successor of ID3)
CART → (Classification And Regression
Tree)
CHAID → (Chi-square automatic
interaction detection Performs multi-level
splits when computing classification trees)
MARS → (multivariate adaptive regression
splines)
Steps in ID3 algorithm:
1. It begins with the original set S as the root node.
2. On each iteration of the algorithm, it iterates
through the very unused attribute of the set S and
calculates Entropy(H) and Information gain(IG) of
this attribute.
3. It then selects the attribute which has the smallest
Entropy or Largest Information gain.
4. The set S is then split by the selected attribute to
produce a subset of the data.
5. The algorithm continues to recur on each subset,
considering only attributes never selected before.
Attribute Selection Measures
• If the dataset consists of N attributes then
deciding which attribute to place at the root or
at different levels of the tree as internal nodes is
a complicated step. By just randomly selecting
any node to be the root can’t solve the issue. If
we follow a random approach, it may give us
bad results with low accuracy.
Attribute Selection Measures
• Entropy,
Information gain,
Gini index,
Gain Ratio,
Reduction in Variance
Chi-Square
Entropy
• Entropy is a measure of the randomness in the
information being processed. The higher the
entropy, the harder it is to draw any conclusions
from that information. Flipping a coin is an
example of an action that provides information
that is random.
•
• Where S → Current state, and Pi → Probability of an event i of state S or
Percentage of class i in a node of state S.
Entropy definition
• if the target attribute can take on c different
values, then the entropy of S relative to this
c-wise classification is
• Defined as
Entropy
Entropy in binary classification
• Entropy measures the impurity of a collection of examples. It depends from the distribution
of the random variable p.
– S is a collection of training examples
– p+ the proportion of positive examples in S
– p– the proportion of negative examples in S
Entropy (S) ≡ – p+ log2 p+ – p–log2 p– [0 log20 = 0]
Entropy ([14+, 0–]) = – 14/14 log2 (14/14) – 0 log2 (0) = 0
Entropy ([9+, 5–]) = – 9/14 log2 (9/14) – 5/14 log2 (5/14) = 0,94
Entropy ([7+, 7– ]) = – 7/14 log2 (7/14) – 7/14 log2 (7/14) =
= 1/2 + 1/2 = 1 [log21/2 = – 1]
Note: the log of a number < 1 is negative, 0 ≤ p ≤ 1, 0 ≤ entropy ≤ 1
Example
Information Gain
• Information gain or IG is a statistical property
that measures how well a given attribute
separates the training examples according to
their target classification. Constructing a
decision tree is all about finding an attribute that
returns the highest information gain and the
smallest entropy.
Information Gain
Node purity
Entropy calculation
– Here the percentage of students who play cricket
is 0.5 and the percentage of students who do not
play cricket is of course also 0.5.
– Since the log of 0.5 bases two is -1, the entropy
for this node will be 1
Entropy calculation in a pure node
Entropy is zero here
Lower entropy means more pure node and higher entropy means less pure nodes.
Information gain as entropy reduction
• a measure of the effectiveness of an attribute in classifying the training
data is called information gain,
• This is the expected reduction in entropy caused by partitioning the
examples according to this attribute.
• The information gain, Gain(S, A) of an attribute A,
• https://youtu.be/coOTEc-0OGw
• Decision Tree | ID3 Algorithm | Solved
Numerical Example |
https://youtu.be/fs0wsU2sSPQ
• How to build a decision Tree for
Boolean Function |
Example
Example
Which attribute is the best classifier?
Gini Index
It is calculated by subtracting the sum of the
squared probabilities of each class from one. It
favors larger partitions and easy to implement
whereas information gain favors smaller partitions
with distinct values.
How to avoid/counter
Overfitting in Decision Trees?
• Building trees that “adapt too much” to the
training examples may lead to “overfitting”.
• Here are two ways to remove overfitting:
1. Pruning Decision Trees.
2. Random Forest
Pruning Decision Trees
• The splitting process results in fully grown
trees until the stopping criteria are reached.
But, the fully grown tree is likely to overfit
the data, leading to poor accuracy on unseen
data.
Pruning Decision Trees
Pruning Decision Trees
• In pruning, you trim off the branches of the tree,
i.e., remove the decision nodes starting from the
leaf node such that the overall accuracy is not
disturbed. This is done by segregating the actual
training set into two sets: training data set, D
and validation data set, V. Prepare the decision
tree using the segregated training data set, D.
Then continue trimming the tree accordingly to
optimize the accuracy of the validation data set,
V.
Pruning Decision Trees
• the ‘Age’ attribute in the left-hand side of the tree has been pruned as it has
more importance on the right-hand side of the tree, hence removing overfitting.
Random Forest
• Random Forest is an example of ensemble
learning, in which we combine multiple machine
learning algorithms to obtain better predictive
performance.
• Why the name “Random”?
• Two key concepts that give it the name random:
1. A random sampling of training data set when building trees.
2. Random subsets of features considered when splitting nodes.
• The random forest algorithm solves the above challenge by combining the
predictions made by multiple decision trees and returning a single output. This
is done using an extension of a technique called bagging, or bootstrap
aggregation.
Random Forest
• Bagging is a procedure that is applied to reduce
the variance of machine learning models. It
works by averaging a set of observations to
reduce variance.
• https://youtu.be/eM4uJ6XGnSM
Random forest-Here is how bagging works:
Bootstrap
• If we had more than one training dataset, we
could train multiple decision trees on each
dataset and average the results.
• However, since we usually only have one
training dataset in most real-world scenarios, a
statistical technique called bootstrap is used to
sample the dataset with replacement.
• Then, multiple decision trees are created, and
each tree is trained on a different data sample:
Bootstrap
Aggregation
• In this step, the prediction of each decision tree will be
combined to come up with a single output.
• In the case of a classification problem, a majority class
prediction is made:
Why do we randomly sample variables in the
random forest algorithm?
• In the random forest algorithm, it is not only
rows that are randomly sampled, but
variables too.
• This is because if we were to build multiple
decision trees with the same features, every
tree will be similar and highly correlated with
each other, potentially yielding the same
result. This will again lead to the issue of high
variance.
Decision Trees vs. Random Forests - Which
One Is Better and Why?
• Random forests typically perform better than decision trees due
to the following reasons:
• Random forests solve the problem of overfitting because they
combine the output of multiple decision trees to come up with a
final prediction.
• When you build a decision tree, a small change in data leads to a
huge difference in the model’s prediction. With a random forest,
this problem does not arise since the data is sampled many times
before generating a prediction.
• In terms of speed, however, the random forests are slower since
more time is taken to construct multiple decision trees. Adding
more trees to a random forest model will improve its accuracy to a
certain extent, but also increases computation time.
•
Decision Trees vs. Random Forests - Which One Is
Better and Why?
• decision trees are also easier to interpret than random forests
since they are straightforward. It is easy to visualize a decision tree
and understand how the algorithm reached its outcome. A
random forest is harder to deconstruct since it is more complex
and combines the output of multiple decision trees to make a
prediction.
Example: Random Forest
– Suppose there is a dataset that contains multiple
fruit images. So, this dataset is given to the
Random forest classifier. The dataset is divided
into subsets and given to each decision tree.
During the training phase, each decision tree
produces a prediction result, and when a new
data point occurs, then based on the majority of
results, the Random Forest classifier predicts the
final decision
Example:
Applications of Random Forest
• Banking: Banking sector mostly uses this
algorithm for the identification of loan risk.
• Medicine: With the help of this algorithm,
disease trends and risks of the disease can be
identified.
• Land Use: We can identify the areas of similar
land use by this algorithm.
• Marketing: Marketing trends can be identified
using this algorithm.
Issues in decision trees learning
• determining how deeply to grow the decision tree,
• handling continuous attributes,
• choosing an appropriate attribute selection measure,
• handling training data with missing attribute values,
• handling attributes with differing costs,
• improving computational efficiency.

More Related Content

What's hot

Control Strategies in AI
Control Strategies in AI Control Strategies in AI
Control Strategies in AI Bharat Bhushan
 
ProLog (Artificial Intelligence) Introduction
ProLog (Artificial Intelligence) IntroductionProLog (Artificial Intelligence) Introduction
ProLog (Artificial Intelligence) Introductionwahab khan
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptxRoshan86572
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extractionskylian
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lexAnusuya123
 
FUNCTION APPROXIMATION
FUNCTION APPROXIMATIONFUNCTION APPROXIMATION
FUNCTION APPROXIMATIONankita pandey
 
Artificial nueral network slideshare
Artificial nueral network slideshareArtificial nueral network slideshare
Artificial nueral network slideshareRed Innovators
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learningmilad abbasi
 
Principles of programming languages. Detail notes
Principles of programming languages. Detail notesPrinciples of programming languages. Detail notes
Principles of programming languages. Detail notesVIKAS SINGH BHADOURIA
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERKnoldus Inc.
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsPrashanth Guntal
 

What's hot (20)

Control Strategies in AI
Control Strategies in AI Control Strategies in AI
Control Strategies in AI
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Java programming -Object-Oriented Thinking- Inheritance
Java programming -Object-Oriented Thinking- InheritanceJava programming -Object-Oriented Thinking- Inheritance
Java programming -Object-Oriented Thinking- Inheritance
 
ProLog (Artificial Intelligence) Introduction
ProLog (Artificial Intelligence) IntroductionProLog (Artificial Intelligence) Introduction
ProLog (Artificial Intelligence) Introduction
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Fuzzy Membership Function
Fuzzy Membership Function Fuzzy Membership Function
Fuzzy Membership Function
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptx
 
Feature Extraction
Feature ExtractionFeature Extraction
Feature Extraction
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lex
 
FUNCTION APPROXIMATION
FUNCTION APPROXIMATIONFUNCTION APPROXIMATION
FUNCTION APPROXIMATION
 
Cs419 lec10 left recursion and left factoring
Cs419 lec10   left recursion and left factoringCs419 lec10   left recursion and left factoring
Cs419 lec10 left recursion and left factoring
 
Artificial nueral network slideshare
Artificial nueral network slideshareArtificial nueral network slideshare
Artificial nueral network slideshare
 
Autoencoders in Deep Learning
Autoencoders in Deep LearningAutoencoders in Deep Learning
Autoencoders in Deep Learning
 
Principles of programming languages. Detail notes
Principles of programming languages. Detail notesPrinciples of programming languages. Detail notes
Principles of programming languages. Detail notes
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Artificial Neural Network Topology
Artificial Neural Network TopologyArtificial Neural Network Topology
Artificial Neural Network Topology
 
NAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIERNAIVE BAYES CLASSIFIER
NAIVE BAYES CLASSIFIER
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Types of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithmsTypes of clustering and different types of clustering algorithms
Types of clustering and different types of clustering algorithms
 

Similar to part3Module 3 ppt_with classification.pptx

Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxnarmeen11
 
Machine learning
Machine learningMachine learning
Machine learninghplap
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfAnkita Tiwari
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.ASHOK KUMAR
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 
Artificial Neural Networks for data mining
Artificial Neural Networks for data miningArtificial Neural Networks for data mining
Artificial Neural Networks for data miningALIZAIB KHAN
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter TuningJon Lederman
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluationeShikshak
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxNAGARAJANS68
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash courseVishwas N
 
unit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxunit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxDr.Shweta
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
AI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxAI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxMohammadAsim91
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence ApproachesJincy Nelson
 

Similar to part3Module 3 ppt_with classification.pptx (20)

Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Machine learning
Machine learningMachine learning
Machine learning
 
EssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdfEssentialsOfMachineLearning.pdf
EssentialsOfMachineLearning.pdf
 
Endsem AI merged.pdf
Endsem AI merged.pdfEndsem AI merged.pdf
Endsem AI merged.pdf
 
Machine learning ppt.
Machine learning ppt.Machine learning ppt.
Machine learning ppt.
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Artificial Neural Networks for data mining
Artificial Neural Networks for data miningArtificial Neural Networks for data mining
Artificial Neural Networks for data mining
 
Artificial Neural Networks for Data Mining
Artificial Neural Networks for Data MiningArtificial Neural Networks for Data Mining
Artificial Neural Networks for Data Mining
 
Statistical learning intro
Statistical learning introStatistical learning intro
Statistical learning intro
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
Deep learning - a primer
Deep learning - a primerDeep learning - a primer
Deep learning - a primer
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
unit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxunit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptx
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
AI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxAI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptx
 
Artificial Intelligence Approaches
Artificial Intelligence  ApproachesArtificial Intelligence  Approaches
Artificial Intelligence Approaches
 

More from VaishaliBagewadikar

More from VaishaliBagewadikar (7)

SEPM_MODULE 2 PPT.pptx
SEPM_MODULE 2 PPT.pptxSEPM_MODULE 2 PPT.pptx
SEPM_MODULE 2 PPT.pptx
 
Module-4_Part-II.pptx
Module-4_Part-II.pptxModule-4_Part-II.pptx
Module-4_Part-II.pptx
 
Module-3_SVM_Kernel_KNN.pptx
Module-3_SVM_Kernel_KNN.pptxModule-3_SVM_Kernel_KNN.pptx
Module-3_SVM_Kernel_KNN.pptx
 
chapter3.pptx
chapter3.pptxchapter3.pptx
chapter3.pptx
 
Module 2 softcomputing.pptx
Module 2 softcomputing.pptxModule 2 softcomputing.pptx
Module 2 softcomputing.pptx
 
SC1.pptx
SC1.pptxSC1.pptx
SC1.pptx
 
FuzzyRelations.pptx
FuzzyRelations.pptxFuzzyRelations.pptx
FuzzyRelations.pptx
 

Recently uploaded

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 

Recently uploaded (20)

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 

part3Module 3 ppt_with classification.pptx

  • 1. Machine Learning 1 Department of Computer Science & Engineering Course Name & Course Code Presentation Material Department of Computer Science & Engineering Course Code: Semester: V Course Title: Machine Learning Year: 2022 Faculty Name: Prof. Arunkumar, Dr. Tina Babu , Dr. Revathi V, Dr. Geetha, Prof. Ranjini
  • 2. MODULE 3 Syllabus – Supervised Learning Introduction to Supervised Learning, Introduction to Perceptron model and its adaptive learning algorithms (gradient Decent and Stochastic Gradient Decent), Introduction to classification, Naive Bayes classification Binary and multi class Classification, decision trees and random forest, Regression (methods of function estimation) --Linear regression and Non-linear regression, logistic regression, Introduction To Kernel Based Methods of machine learning: K-Nearest neighbourhood, kernel functions, SVM, Introduction to ensemble based learning methods 2 Department of Computer Science & Engineering Course Name & Course Code
  • 3. Introduction to Supervised Learning • Machines are trained using well "labelled" training data, and on basis of that data, machines predict the output. – The labelled data means some input data is already tagged with the correct output. • The training data provided to the machines work as the supervisor that teaches the machines to predict the output correctly. • Supervised learning is a process of providing input data as well as correct output data to the machine learning model. The aim of a supervised learning algorithm is to find a mapping function to map the input variable(x) with the output variable(y).
  • 7. Introduction to Supervised Learning 1. Regression • Used if there is a relationship between the input variable and the output variable. • It is used for the prediction of continuous variables, such as Weather forecasting, Market Trends, etc.
  • 8. Introduction to Supervised Learning 2. Classification • Used when the output variable is categorical, which means there are two classes such as Yes-No, Male- Female, True-false, etc.
  • 15. Introduction to Perceptron Model • What is perceptron? • Neural Network In 5 Minutes | What Is A Neural Network? | How Neural Networks Work | Simplilearn – YouTube • Perceptron is a building block of an Artificial Neural Network. • Perceptron is a linear Machine Learning algorithm used for supervised learning for various binary classifiers. • This algorithm enables neurons to learn elements and processes them one by one during preparation.
  • 16. Introduction to Perceptron Model • What is the Perceptron model in Machine Learning? • Perceptron is also understood as an Artificial Neuron or neural network unit that helps to detect certain input data computations in business intelligence. • Perceptron model is also treated as one of the best and simplest types of Artificial Neural networks. However, it is a supervised learning algorithm of binary classifiers.
  • 18. Introduction to Perceptron Model • Basic Components of Perceptron • it as a single-layer neural network with four main parameters – input values, – weights and Bias, – net sum, – an activation function.
  • 20. Introduction to Perceptron Model • Perceptron thus has the following three basic elements
  • 22. Introduction to Perceptron Model • Why do we Need Weight and Bias? • Network gets trained it adjusts both parameters to achieve the desired values and the correct output. • Weights - Weights are used to measure the importance of each feature in predicting output value. • Features with values close to zero are said to have lesser weight or significance. These have less importance in the prediction process compared to the features with values further from zero known as weights with a larger value. • Besides high-weighted features having greater predictive power than low-weighting ones, the weight can also be positive or negative.
  • 23. Introduction to Perceptron Model • Why do we Need Weight and Bias? • Bias - bias delays the trigger of the activation function. It acts like an intercept in a linear equation. • Bias is a constant used to adjust the output and help the model to provide the best fit output for the given data.
  • 28. Introduction to Perceptron Model • Learning Rate – It’s a positive constant that is used to moderate the degree to which weights are changed at each step. • What is Perceptron: A Beginners Guide for Perceptron [Updated] (simplilearn.com)
  • 30. Introduction to Perceptron Model • Algorithm
  • 35. Introduction to Perceptron Model • Example 1 - 2 AND GATE Perceptron Training Rule | Artificial Neural Networks Machine Learning by Mahesh Huddar – YouTube • Example 2 - 3. OR GATE Perceptron Training Rule | Artificial Neural Networks Machine Learning by Mahesh Huddar - YouTube • Example 3 - Perceptron Rule to design XOR Logic Gate Solved Example ANN Machine Learning by Mahesh Huddar - YouTube
  • 36. Introduction to Perceptron Model • 1. Gradient Descent | Delta Rule | Delta Rule Derivation Nonlinearly Separable Data by Mahesh Huddar - YouTube
  • 44. Introduction to Perceptron Model • Limitations of Gradient Descent 1) converging to a local minimum can sometimes be quite slow (i.e., it can require many thousands of gradient descent steps) 2) if there are multiple local minima in the error surface, then there is no guarantee that the procedure will find the global minimum.
  • 45. Introduction to Perceptron Model • Stochastic Gradient Descent • Incremental Gradient Descent – approximate this gradient descent search by updating weights incrementally, following the calculation of the error for each individual example.
  • 47. Introduction to Perceptron Model • One way to view this stochastic gradient descent is to consider a distinct error function defined for each individual training example d as follows • Stochastic gradient descent iterates over the training examples d in D, at each iteration altering the weights according to the gradient with respect to
  • 48. Introduction to Perceptron Model • The key differences between standard gradient descent and stochastic gradient descent are:
  • 49. Introduction to Perceptron Model V Sem – Machine Learning Department of Computer Science & Engineering
  • 50. Supervised Learning: Popular Supervised Algorithms V Sem – Machine Learning Department of Computer Science & Engineering
  • 51. Machine Learning: Glimpse V Sem – Machine Learning Department of Computer Science & Engineering
  • 52. Introduction to classification 52 V Sem – Machine Learning Department of Computer Science & Engineering
  • 53. Classification in Machine Learning • Classification is a supervised machine learning method where the model tries to predict the correct label of a given input data. In classification, the model is fully trained using the training data, and then it is evaluated on test data before being used to perform prediction on new unseen data. • For instance, an algorithm can learn to predict whether a given email is spam or ham (no spam), as illustrated below.
  • 54. Lazy Learners Vs. Eager Learners • Eager learners are machine learning algorithms that first build a model from the training dataset before making any prediction on future datasets. They spend more time during the training process because of their eagerness to have a better generalization during the training from learning the weights, but they require less time to make predictions. • Most machine learning algorithms are eager learners, and below are some examples: • Logistic Regression. • Support Vector Machine. • Decision Trees. • Artificial Neural Networks.
  • 55. Lazy Learners Vs. Eager Learners • Lazy learners or instance-based learners, on the other hand, do not create any model immediately from the training data, and this is where the lazy aspect comes from. • They just memorize the training data, and each time there is a need to make a prediction, they search for the nearest neighbor from the whole training data, which makes them very slow during prediction. Some examples of this kind are: • K-Nearest Neighbor. • Case-based reasoning.
  • 57. Machine Learning Classification in Real Life Healthcare Training a machine learning model on historical patient data can help healthcare specialists accurately analyze their diagnoses: • During the COVID-19 pandemic, machine learning models were implemented to efficiently predict whether a person had COVID-19 or not. Education • Education is one of the domains dealing with the most textual, video, and audio data. This unstructured information can be analyzed with the help of Natural Language technologies to perform different tasks such as: • The classification of documents per category. Sustainable agriculture • Agriculture is one of the most valuable pillars of human survival. Introducing sustainability can help improve farmers' productivity at a different level without damaging the environment: • By using classification models to predict which type of land is suitable for a given type of seed.
  • 58. Different Types of Classification Binary Classification The goal is to classify the input data into two mutually exclusive categories. The training data in such a situation is labeled in a binary format: true and false; positive and negative; O and 1; spam and not spam, etc. depending on the problem being tackled. For instance, we might want to detect whether a given image is a truck or a boat. Logistic Regression and Support Vector Machines algorithms are natively designed for binary classifications. However, other algorithms such as K-Nearest Neighbors and Decision Trees can also be used for binary classification.
  • 59. Multi-Class Classification The multi-class classification, on the other hand, has at least two mutually exclusive class labels, where the goal is to predict to which class a given input example belongs to. In the following case, the model correctly classified the image to be a plane. • Most of the binary classification algorithms can be also used for multi-class classification. These algorithms include but are not limited to: • Random Forest • Naive Bayes • K-Nearest Neighbors • Gradient Boosting • SVM • Logistic Regression.
  • 60. Multi-Class Classification • Didn’t you say that SVM and Logistic Regression do not support multi-class classification by default? • → That’s correct. However, we can apply binary transformation approaches such as one-versus-one and one-versus-all to adapt native binary classification algorithms for multi-class classification tasks. • One-versus-one: this strategy trains as many classifiers as there are pairs of labels. If we have a 3-class classification, we will have three pairs of labels, thus three classifiers, as shown below. • For N labels, we will have Nx(N-1)/2 classifiers. Each classifier is trained on a single binary dataset, and the final class is predicted by a majority vote between all the classifiers. One-vs-one approach works best for SVM and other kernel-based algorithms.
  • 61. Multi-Class Classification • One-versus-rest: at this stage, we start by considering each label as an independent label and consider the rest combined as only one label. With 3-classes, we will have three classifiers. • In general, for N labels, we will have N binary classifiers.
  • 62. Multi-Label Classification • In multi-label classification tasks, we try to predict 0 or more classes for each input example. In this case, there is no mutual exclusion because the input example can have more than one label. • Such a scenario can be observed in different domains, such as auto- tagging in Natural Language Processing, where a given text can contain multiple topics. Similarly to computer vision, an image can contain multiple objects,
  • 63. Multi-Label Classification • It is not possible to use multi-class or binary classification models to perform multi-label classification. However, most algorithms used for those standard classification tasks have their specialized versions for multi-label classification. We can cite: • Multi-label Decision Trees • Multi-label Gradient Boosting • Multi-label Random Forests
  • 64. Imbalanced Classification • For the imbalanced classification, the number of examples is unevenly distributed in each class, meaning that we can have more of one class than the others in the training data. Let’s consider the following 3-class classification scenario where the training data contains: 60% of trucks, 25% of planes, and 15% of boats.
  • 65. Imbalanced Classification • The imbalanced classification problem could occur in the following scenario: • Fraudulent transaction detections in financial industries • Rare disease diagnosis • Customer churn analysis • Using conventional predictive models such as Decision Trees, Logistic Regression, etc. could not be effective when dealing with an imbalanced dataset, because they might be biased toward predicting the class with the highest number of observations, and considering those with fewer numbers as noise. • So, does that mean that such problems are left behind? • Of course not! We can use multiple approaches to tackle the imbalance problem in a dataset. The most commonly used approaches include sampling techniques or harnessing the power of cost-sensitive algorithms.
  • 66. Classification: Meaning • Process of arranging data into homogeneous (similar) groups according to their common characteristics. • Raw data cannot be easily understood, and it is not fit for further analysis and interpretation. Arrangement of data helps users in comparison and analysis. • For example, – the population of a town can be grouped according to sex, age, marital status, etc. • “Classification is the process of arranging data into sequences according to their common characteristics or separating them into different related parts.” – Prof. Secrist V Sem – Machine Learning Department of Computer Science & Engineering
  • 67. Classification of Data • The method of arranging data into “homogeneous classes” according to the common features present in the data is known as classification. • A planned data analysis system makes the fundamental data easy to find and recover. – This can be of particular interest for legal discovery, risk management, and compliance. – Written methods and sets of guidelines for data classification should determine what levels and measures the company will use to organise data and define the roles of employees within the business regarding input stewardship. – Once a data -classification scheme has been designed, the security standards that stipulate proper approaching practices for each division and the storage criteria that determines the data’s lifecycle demands should be discussed. V Sem – Machine Learning Department of Computer Science & Engineering
  • 68. Classification of Data: Objectives • To consolidate the volume of data in such a way that similarities and differences can be quickly understood. Figures can consequently be ordered in sections with common traits. • To aid comparison. • To point out the important characteristics of the data at a flash. • To give importance to the prominent data collected while separating the optional elements. • To allow a statistical method of the materials gathered. V Sem – Machine Learning Department of Computer Science & Engineering
  • 69. Introduction to Classification V Sem – Machine Learning Department of Computer Science & Engineering
  • 74.
  • 80. Naïve Bayes Classifier Example 1: 1. Solved Example Naive Bayes Classifier to classify New Instance PlayTennis Example Mahesh Huddar – YouTube Example 2: 2. Solved Example Naive Bayes Classifier to classify New Instance | Species Example by Mahesh Huddar – YouTube Example 3: 3. Solved Example Naive Bayes Classifier to classify New Instance Car Example by Mahesh Huddar - YouTube
  • 81. Bayes theorem in Multi class classification • The exact Bayesian classification is technically impractical since we have many evidence variables (predictors) in our dataset. When the number of predictors increases, many records that we want to classify will not have an exact match. • The above equation shows only the case where we have 3 evidence variables and even with only 3 of them it is not easy to find an exact match.
  • 82. Bayes theorem in Multi class classification • The naive assumption introduces that the variables are independent given the class. So we can calculate the conditional probability as follows: • By assuming the conditional independence between variables we can convert the Bayes equation into a simpler and naive one. Even though assuming independence between variables sounds superficial, the Naive Bayes algorithm performs pretty well in many classification tasks. • For more detail: https://www.geeksforgeeks.org/naive-bayes-classifiers/
  • 84.
  • 85. Decision trees ❑ Decision Tree is the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label. https://youtu.be/RmajweUFKvM
  • 87. Important Terminology related to Decision Trees 1. Root Node: It represents the entire population or sample and this further gets divided into two or more homogeneous sets. 2. Splitting: It is a process of dividing a node into two or more sub-nodes. 3. Decision Node: When a sub-node splits into further sub-nodes, then it is called the decision node. 4. Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal node. 5. Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say the opposite process of splitting. 6. Branch / Sub-Tree: A subsection of the entire tree is called branch or sub-tree. 7. Parent and Child Node: A node, which is divided into sub- nodes is called a parent node of sub-nodes whereas sub-nodes are the child of a parent node.
  • 89. Assumptions while creating Decision Tree • In the beginning, the whole training set is considered as the root. • Feature values are preferred to be categorical. If the values are continuous then they are discretized prior to building the model. • Records are distributed recursively on the basis of attribute values. • Order to placing attributes as root or internal node of the tree is done by using some statistical approach.
  • 90. Decision trees expressivity • Decision Trees follow Sum of Product (SOP) representation. The Sum of product (SOP) is also known as Disjunctive Normal Form. For a class, every branch from the root of the tree to a leaf node having the same class is conjunction (product) of values, different branches ending in that class form a disjunction (sum).
  • 91. A Decision Tree for the concept PlayTennis • This tree classifies Saturday mornings according to whether or not they are suitable for playing tennis.
  • 92. Decision trees expressivity • Decision trees represent a disjunction of conjunctions on constraints on the value of attributes:
  • 94. How do Decision Trees work? • Decision trees use multiple algorithms to decide to split a node into two or more sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. In other words, we can say that the purity of the node increases with respect to the target variable.
  • 95. Algorithms used in Decision Trees: • ID3 → (extension of D3) C4.5 → (successor of ID3) CART → (Classification And Regression Tree) CHAID → (Chi-square automatic interaction detection Performs multi-level splits when computing classification trees) MARS → (multivariate adaptive regression splines)
  • 96. Steps in ID3 algorithm: 1. It begins with the original set S as the root node. 2. On each iteration of the algorithm, it iterates through the very unused attribute of the set S and calculates Entropy(H) and Information gain(IG) of this attribute. 3. It then selects the attribute which has the smallest Entropy or Largest Information gain. 4. The set S is then split by the selected attribute to produce a subset of the data. 5. The algorithm continues to recur on each subset, considering only attributes never selected before.
  • 97. Attribute Selection Measures • If the dataset consists of N attributes then deciding which attribute to place at the root or at different levels of the tree as internal nodes is a complicated step. By just randomly selecting any node to be the root can’t solve the issue. If we follow a random approach, it may give us bad results with low accuracy.
  • 98. Attribute Selection Measures • Entropy, Information gain, Gini index, Gain Ratio, Reduction in Variance Chi-Square
  • 99. Entropy • Entropy is a measure of the randomness in the information being processed. The higher the entropy, the harder it is to draw any conclusions from that information. Flipping a coin is an example of an action that provides information that is random. •
  • 100. • Where S → Current state, and Pi → Probability of an event i of state S or Percentage of class i in a node of state S.
  • 101. Entropy definition • if the target attribute can take on c different values, then the entropy of S relative to this c-wise classification is • Defined as
  • 103. Entropy in binary classification • Entropy measures the impurity of a collection of examples. It depends from the distribution of the random variable p. – S is a collection of training examples – p+ the proportion of positive examples in S – p– the proportion of negative examples in S Entropy (S) ≡ – p+ log2 p+ – p–log2 p– [0 log20 = 0] Entropy ([14+, 0–]) = – 14/14 log2 (14/14) – 0 log2 (0) = 0 Entropy ([9+, 5–]) = – 9/14 log2 (9/14) – 5/14 log2 (5/14) = 0,94 Entropy ([7+, 7– ]) = – 7/14 log2 (7/14) – 7/14 log2 (7/14) = = 1/2 + 1/2 = 1 [log21/2 = – 1] Note: the log of a number < 1 is negative, 0 ≤ p ≤ 1, 0 ≤ entropy ≤ 1
  • 105. Information Gain • Information gain or IG is a statistical property that measures how well a given attribute separates the training examples according to their target classification. Constructing a decision tree is all about finding an attribute that returns the highest information gain and the smallest entropy.
  • 108. Entropy calculation – Here the percentage of students who play cricket is 0.5 and the percentage of students who do not play cricket is of course also 0.5. – Since the log of 0.5 bases two is -1, the entropy for this node will be 1
  • 109. Entropy calculation in a pure node Entropy is zero here Lower entropy means more pure node and higher entropy means less pure nodes.
  • 110. Information gain as entropy reduction • a measure of the effectiveness of an attribute in classifying the training data is called information gain, • This is the expected reduction in entropy caused by partitioning the examples according to this attribute. • The information gain, Gain(S, A) of an attribute A,
  • 111. • https://youtu.be/coOTEc-0OGw • Decision Tree | ID3 Algorithm | Solved Numerical Example | https://youtu.be/fs0wsU2sSPQ • How to build a decision Tree for Boolean Function |
  • 114. Which attribute is the best classifier?
  • 115. Gini Index It is calculated by subtracting the sum of the squared probabilities of each class from one. It favors larger partitions and easy to implement whereas information gain favors smaller partitions with distinct values.
  • 116. How to avoid/counter Overfitting in Decision Trees? • Building trees that “adapt too much” to the training examples may lead to “overfitting”. • Here are two ways to remove overfitting: 1. Pruning Decision Trees. 2. Random Forest
  • 117. Pruning Decision Trees • The splitting process results in fully grown trees until the stopping criteria are reached. But, the fully grown tree is likely to overfit the data, leading to poor accuracy on unseen data.
  • 119. Pruning Decision Trees • In pruning, you trim off the branches of the tree, i.e., remove the decision nodes starting from the leaf node such that the overall accuracy is not disturbed. This is done by segregating the actual training set into two sets: training data set, D and validation data set, V. Prepare the decision tree using the segregated training data set, D. Then continue trimming the tree accordingly to optimize the accuracy of the validation data set, V.
  • 120. Pruning Decision Trees • the ‘Age’ attribute in the left-hand side of the tree has been pruned as it has more importance on the right-hand side of the tree, hence removing overfitting.
  • 121. Random Forest • Random Forest is an example of ensemble learning, in which we combine multiple machine learning algorithms to obtain better predictive performance. • Why the name “Random”? • Two key concepts that give it the name random: 1. A random sampling of training data set when building trees. 2. Random subsets of features considered when splitting nodes. • The random forest algorithm solves the above challenge by combining the predictions made by multiple decision trees and returning a single output. This is done using an extension of a technique called bagging, or bootstrap aggregation.
  • 122. Random Forest • Bagging is a procedure that is applied to reduce the variance of machine learning models. It works by averaging a set of observations to reduce variance. • https://youtu.be/eM4uJ6XGnSM
  • 123. Random forest-Here is how bagging works:
  • 124. Bootstrap • If we had more than one training dataset, we could train multiple decision trees on each dataset and average the results. • However, since we usually only have one training dataset in most real-world scenarios, a statistical technique called bootstrap is used to sample the dataset with replacement. • Then, multiple decision trees are created, and each tree is trained on a different data sample:
  • 126. Aggregation • In this step, the prediction of each decision tree will be combined to come up with a single output. • In the case of a classification problem, a majority class prediction is made:
  • 127. Why do we randomly sample variables in the random forest algorithm? • In the random forest algorithm, it is not only rows that are randomly sampled, but variables too. • This is because if we were to build multiple decision trees with the same features, every tree will be similar and highly correlated with each other, potentially yielding the same result. This will again lead to the issue of high variance.
  • 128. Decision Trees vs. Random Forests - Which One Is Better and Why? • Random forests typically perform better than decision trees due to the following reasons: • Random forests solve the problem of overfitting because they combine the output of multiple decision trees to come up with a final prediction. • When you build a decision tree, a small change in data leads to a huge difference in the model’s prediction. With a random forest, this problem does not arise since the data is sampled many times before generating a prediction. • In terms of speed, however, the random forests are slower since more time is taken to construct multiple decision trees. Adding more trees to a random forest model will improve its accuracy to a certain extent, but also increases computation time. •
  • 129. Decision Trees vs. Random Forests - Which One Is Better and Why? • decision trees are also easier to interpret than random forests since they are straightforward. It is easy to visualize a decision tree and understand how the algorithm reached its outcome. A random forest is harder to deconstruct since it is more complex and combines the output of multiple decision trees to make a prediction.
  • 130. Example: Random Forest – Suppose there is a dataset that contains multiple fruit images. So, this dataset is given to the Random forest classifier. The dataset is divided into subsets and given to each decision tree. During the training phase, each decision tree produces a prediction result, and when a new data point occurs, then based on the majority of results, the Random Forest classifier predicts the final decision
  • 132. Applications of Random Forest • Banking: Banking sector mostly uses this algorithm for the identification of loan risk. • Medicine: With the help of this algorithm, disease trends and risks of the disease can be identified. • Land Use: We can identify the areas of similar land use by this algorithm. • Marketing: Marketing trends can be identified using this algorithm.
  • 133. Issues in decision trees learning • determining how deeply to grow the decision tree, • handling continuous attributes, • choosing an appropriate attribute selection measure, • handling training data with missing attribute values, • handling attributes with differing costs, • improving computational efficiency.