Unit4 updated ML fa djg aklj gaabzag.pdf

Noida Institute of Engineering and Technology,
Greater Noida
PROBABILISTIC LEARNING &
ENSEMBLE
11/5/2023
Dr. Hitesh Singh KCS 055 ML Unit 3
1
Dr. Hitesh Singh
Associate Professor
IT DEPARTMENT
Unit: 4
MACHINE LEARNING
B Tech 5th Sem Section A & B

CONTENT
11/5/2023 Dr. Hitesh Singh KCS 055 ML Unit 1 2
Brief Introduction of Faculty
I am pleased to introduce myself as Dr. Hitesh Singh, presently associated with NIET, Greater Noida as
Assistant Professor in IT Department. I completed my Ph.D. degree under the supervision of Boncho Bonev
(PhD), Technical University of Sofia, Sofia, Bulgaria in 2019. My area of research interest is related to Radio
wave propagation, Machine Learning and have rich experience of millimetre wave technologies.
I started my research carrier in 2009 and since then I published research articles in SCI/Scopus indexed
Journals/Conferences like Springer, IEEE, Elsevier. I presented research work in international reputed
Conferences like (IEEE International Conference on Infocom Technologies and Unmanned
Systems (ICTUS'2017)”, Dubai and ELECTRONICA, Sofia. Four patents and two book chapter have been
published (Elsevier Publication) under my inventor ship and authorship.
My area of research interest is related to Radio wave propagation, Machine Learning and have rich
experience of millimeter wave technologies.

CONTENT
Evaluation Scheme

THE CONCEPT LEARNING TASK
Subject Syllabus

Subject Syllabus

Text Books

11/5/2023
7
Branch Wise Applications

11/5/2023
8
Course Objective
• To introduce students to the basic concepts of Machine Learning.
• To develop skills of implementing machine learning for solving
practical problems.
• To gain experience of doing independent study and research related
to Machine Learning

Course Outcome
At the end of the semester, student will be able to:
Course
Outcomes
(CO)
CO Description Blooms’
Taxonomy
CO1 Understanding utilization and implementation proper
machine learning algorithm.
K2
CO2 Understand the basic supervised machine learning
algorithms.
K2
CO3 Understand the difference between supervise and
unsupervised learning.
K2
CO4 Understand algorithmic topics of machine learning and
mathematically deep enough to introduce the required
theory.
K2
CO5 Apply an appreciation for what is involved in learning
from data.
K3

CONTENT
10
 1. Engineering knowledge:
 2. Problem analysis:
 3. Design/development of solutions:
 4. Conduct investigations of complex problems:
 5. Modern tool usage:
 6. The engineer and society:
 7. Environment and sustainability:
 8. Ethics:
 9. Individual and team work:
 10. Communication:
 11. Project management and finance:
 12. Life-long learning
11/5/2023 Dr. Hitesh Singh KCS 055 ML Unit 1
Program Outcome

CO-PO and PSO Mapping
Correlation Matrix of CO with PO
CO.K PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
KCS055.1 3 2 2 1 2 2 - - - 1 - -
KCS055.2 3 2 2 3 2 2 1 - 2 1 1 2
KCS055.3 2 2 2 2 2 2 2 1 1 - 1 3
KCS055.4 3 3 1 3 1 1 2 - 2 1 1 2
KCS055.5 3 2 1 2 1 2 1 1 2 1 1 1
AVG 2.8 2.2 1.6 2.2 1.6 1.8 1.2 0.4 1.4 0.8 0.8 1.6

11/5/2023
12
Program Specific Outcomes
• PSO1: Work as a software developer, database
administrator, tester or networking engineer for
providing solutions to the real world and industrial
problems.
• PSO2:Apply core subjects of information technology
related to data structure and algorithm, software
engineering, web technology, operating system, database
and networking to solve complex IT problems.
• PSO3: Practice multi-disciplinary and modern computing
techniques by lifelong learning to establish innovative
career.
• PSO4: Work in a team or individual to manage projects
with ethical concern to be a successful employee or
employer in IT industry.

11/5/2023 13
CO-PO and PSO Mapping
Matrix of CO/PSO:
PSO1 PSO2 PSO3 PSO4
RCS080.1 3 2 3 1
RCS080.2 3 2 2 3
RCS080.3 3 2 3 2
RCS080.4 2 1 1 1
RCS080.5 2 2 1 2
AVG 2.6 1.8 2 1.8

11/5/2023
14
Program Educational Objectives
• PEO1: able to apply sound knowledge in the field
of information technology to fulfill the needs of IT
industry.
• PEO2:able to design innovative and
interdisciplinary systems through latest digital
technologies.
• PEO3: able to inculcate professional and social
ethics, team work and leadership for serving the
society.
• PEO4: able to inculcate lifelong learning in the
field of computing for successful career in
organizations and R&D sectors.

11/5/2023 15
Result Analysis
• ML Result of 2020-21: 89.39%
• Average Marks: 46.05

11/5/2023 16
End Semester Question Paper Template

Prerequisites:
• Statistics.
• Linear Algebra.
• Calculus.
• Probability.
• Programming Languages.
Prerequisite

Brief Introduction to Subject
https://www.youtube.com/watch?v=PPLop4L2eGk&list=PLLssT5z_DsK-
h9vYZkQkYNWcItqhlRJLN

11/5/2023 19
Topic Mapping with Course Outcome
Topics Course outcome
Bayesian Learning,
Bayes Optimal Classifier,
Naıve Bayes Classifier,
Bayesian Belief
Networks.
CO4
CO4
CO4
CO4
CO4

11/5/2023 Gaurav Kumar RCS080 and ML Unit 1 20
Lecture Plan

Lecture Plan

CONTENT
25
 Bayesian Learning, Bayes Optimal Classifier, Naıve Bayes Classifier, Bayesian Belief
Networks.
 Ensembles methods: Bagging & boosting, C5.0 boosting, Random Forest, Gradient
Boosting Machines and XGBoost.
11/5/2023 Dr. Hitesh Singh KCS 055 ML Unit 1
➢ Unit 4 Content:

11/5/2023 26
Unit Objective
The objective of the Unit 4 is
1. To understand the basics of Bayes learning,
2. To understand a clear concept of Byes Optimal Classifier.
3. Brief introduction of Naïve Byes Algorithm ,
4. Use of various approaches of Ensemble methods.

11/5/2023 27
Topic Objective
Student will be able to understand
 Byes Theorem
 Byes Classifier
 Naïve Byes Classifier

11/5/2023
28
Course Objective
• To introduce students to the basic concepts of Machine Learning.
• To develop skills of implementing machine learning for solving
practical problems.
• To gain experience of doing independent study and research related
to Machine Learning

11/5/2023 29
BAYESIAN LEARNING (CO1)
• BAYESIAN LEARNING

11/5/2023 30

11/5/2023 31

11/5/2023 32
Bayes Theorem for Modeling Hypotheses
• Bayes Theorem is a useful tool in applied machine learning.
• It provides a way of thinking about the relationship between data and a
model.
• A machine learning algorithm or model is a specific way of thinking about
the structured relationships in the data.
• In this way, a model can be thought of as a hypothesis about the
relationships in the data, such as the relationship between input (X) and
output (y).
• The practice of applied machine learning is the testing and analysis of
different hypotheses (models) on a given dataset.

11/5/2023 33
Bayes theorem provides a way to calculate the probability of a hypothesis based on its
prior probability, the probabilities of observing various data given the hypothesis, and
the observed data itself.
• Under this framework, each piece of the calculation has a specific name; for
example:
• P(h|D): Posterior probability of the hypothesis (the thing we want to calculate).
• P(h): Prior probability of the hypothesis.
• This gives a useful framework for thinking about and modeling a machine learning
problem.
• If we have some prior domain knowledge about the hypothesis, this is captured in
the prior probability. If we don’t, then all hypotheses may have the same prior
probability.

11/5/2023 34
• If the probability of observing the data P(D) increases, then the probability of the
hypothesis holding given the data P(h|D) decreases.
• Conversely, if the probability of the hypothesis P(h) and the probability of
observing the data given hypothesis increases, the probability of the hypothesis
holding given the data P(h|D) increases.
• The notion of testing different models on a dataset in applied machine learning
can be thought of as estimating the probability of each hypothesis (h1, h2, h3, … in
H) being true given the observed data.
• The optimization or seeking the hypothesis with the maximum posterior
probability in modeling is called maximum a posteriori or MAP for short.

11/5/2023 35
• Under this framework, the probability of the data (D) is constant as it is
used in the assessment of each hypothesis.
• Therefore, it can be removed from the calculation to give the simplified
unnormalized estimate as follows:
• max h in H P(h|D) = P(D|h) * P(h)
• If we do not have any prior information about the hypothesis being tested,
they can be assigned a uniform probability, and this term too will be a
constant and can be removed from the calculation to give the following:
• max h in H P(h|D) = P(D|h)
• That is, the goal is to locate a hypothesis that best explains the observed
data.
• Fitting models like linear regression for predicting a numerical value, and
logistic regression for binary classification can be framed and solved under
the MAP probabilistic framework.
• This provides an alternative to the more common maximum likelihood
estimation (MLE) framework.

11/5/2023 36
• Bayes Theorem for Classification
• Classification is a predictive modeling problem that involves assigning a label to a given input
data sample.
• The problem of classification predictive modeling can be framed as calculating the
conditional probability of a class label given a data sample, for example:
• P(class|data) = (P(data|class) * P(class)) / P(data)
• Where P(class|data) is the probability of class given the provided data.
• This calculation can be performed for each class in the problem and the class that is assigned
the largest probability can be selected and assigned to the input data.
• In practice, it is very challenging to calculate full Bayes Theorem for classification.
• The priors for the class and the data are easy to estimate from a training dataset, if the
dataset is suitability representative of the broader problem.
• The conditional probability of the observation based on the class P(data|class) is not feasible
unless the number of examples is extraordinarily large, e.g. large enough to effectively
estimate the probability distribution for all different possible combinations of values. This is
almost never the case, we will not have sufficient coverage of the domain.
• As such, the direct application of Bayes Theorem also becomes intractable, especially as the
number of variables or features (n) increases.

11/5/2023 37
BAYES OPTIMAL CLASSIFIER (CO1)
BAYES OPTIMAL CLASSIFIER
• Bayes Optimal Classifier is a probabilistic model that makes the most probabilistic
predictions for a new example.
• P(A|B) = P(B|A)*P(A)/P(B)
• For data set
• X = {x1x2x2…..xn}{y} [y=yes/no]
• P(y|x1x2x3…..xn)=[[P(x1|y)*P(x2|y)……P(xn|y)]*P(y)]/P(x1)P(x2)…..P(xn)
• =P(y)ς𝑖=1
𝑛 𝑃 𝑥𝑖 𝑦
𝑃 𝑥1 𝑃 𝑥2 …𝑃(𝑥𝑛)
• P(y|x1x2x3…..xn) =P(y)ς𝒊=𝟏
𝒏
𝑷 𝒙𝒊 𝒚

11/5/2023 38

11/5/2023 39

11/5/2023 40

11/5/2023 41
Naïve Bayes Classifier Algorithm (CO1)
• Naïve Bayes algorithm is a supervised learning algorithm, which is
based on Bayes theorem and used for solving classification
problems.
• It is mainly used in text classification that includes a high-
dimensional training dataset.
• Naïve Bayes Classifier is one of the simple and most effective
Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis
of the probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam
filtration, Sentimental analysis, and classifying articles.

11/5/2023 42
Naïve Bayes Classifier Algorithm(CO1)
• The Naïve Bayes algorithm is comprised of two words Naïve
and Bayes, Which can be described as:
• Naïve: It is called Naïve because it assumes that the
occurrence of a certain feature is independent of the
occurrence of other features. Such as if the fruit is identified
on the bases of color, shape, and taste, then red, spherical,
and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without
depending on each other.
• Bayes: It is called Bayes because it depends on the principle of
Bayes' Theorem.

11/5/2023 43
BAYES THEOREM (CO1)
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used to
determine the probability of a hypothesis with prior knowledge. It depends on the
conditional probability.
• The formula for Bayes' theorem is given as:
• Where,
• P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.
• P(B|A) is Likelihood probability: Probability of the evidence given that the
probability of a hypothesis is true.
• P(A) is Prior Probability: Probability of hypothesis before observing the evidence.
• P(B) is Marginal Probability: Probability of Evidence.

11/5/2023 44
Working of Naïve Bayes' Classifier (CO1)
• Working of Naïve Bayes' Classifier can be understood with the help of the
below example:
• Suppose we have a dataset of weather conditions and corresponding
target variable "Play". So using this dataset we need to decide that
whether we should play or not on a particular day according to the
weather conditions. So to solve this problem, we need to follow the below
steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
• Problem: If the weather is sunny, then the Player should play or not?
• Solution: To solve this, first consider the below dataset:

11/5/2023 45
Outlook Play
0 Rainy Yes
1 Sunny Yes
2 Overcast Yes
3 Overcast Yes
4 Sunny No
5 Rainy Yes
6 Sunny Yes
7 Overcast Yes
8 Rainy No
9 Sunny No
10 Sunny Yes
11 Rainy No
12 Overcast Yes
13 Overcast Yes

11/5/2023 46
Frequency table for the Weather Conditions:
Weather Yes No
Overcast 5 0
Rainy 2 2
Sunny 3 2
Total 10 4

11/5/2023 47
• Likelihood table weather condition
Weather No Yes
Overcast 0 5 5/14= 0.35
Rainy 2 2 4/14=0.29
Sunny 2 3 5/14=0.35
All 4/14=0.29 10/14=0.71

11/5/2023 49
Introduction (CO1)
Advantages of Naïve Bayes Classifier:
• Naïve Bayes is one of the fast and easy ML algorithms to
predict a class of datasets.
• It can be used for Binary as well as Multi-class Classifications.
• It performs well in Multi-class predictions as compared to the
other Algorithms.
• It is the most popular choice for text classification problems.
Disadvantages of Naïve Bayes Classifier:
• Naive Bayes assumes that all features are independent or
unrelated, so it cannot learn the relationship between
features.

11/5/2023 50
Introduction (CO1)
Applications of Naïve Bayes Classifier:
• It is used for Credit Scoring.
• It is used in medical data classification.
• It can be used in real-time predictions because Naïve
Bayes Classifier is an eager learner.
• It is used in Text classification such as Spam filtering
and Sentiment analysis.

11/5/2023 51
Introduction (CO1)
Types of Naïve Bayes Model:
• There are three types of Naive Bayes Model, which are given below:
• Gaussian: The Gaussian model assumes that features follow a normal distribution.
This means if predictors take continuous values instead of discrete, then the model
assumes that these values are sampled from the Gaussian distribution.
• Multinomial: The Multinomial Naïve Bayes classifier is used when the data is
multinomial distributed. It is primarily used for document classification problems,
it means a particular document belongs to which category such as Sports, Politics,
education, etc.
The classifier uses the frequency of words for the predictors.
• Bernoulli: The Bernoulli classifier works similar to the Multinomial classifier, but
the predictor variables are the independent Booleans variables. Such as if a
particular word is present or not in a document. This model is also famous for
document classification tasks.

11/5/2023 52
Bayesian Belief Network (CO1)
• Why BAYESIAN BELIEF NETWORKS ?
• To represent the probabilistic relationships between different
classes.
• To avoid dependences between value of attributes by joint
conditional probability distribution.
• In Naive Bayes Classifier, attributes are conditionally
independent

11/5/2023 53

11/5/2023 54

11/5/2023 55

11/5/2023 56

11/5/2023 57

11/5/2023 58
Bayesian Belief Network EXAMPLE (CO1)

11/5/2023 59

11/5/2023 60

11/5/2023 61

11/5/2023 62

11/5/2023 63

11/5/2023 64

11/5/2023 65

11/5/2023 66
Expectation-Maximization Algorithm (CO1)
• Expectation-Maximization Algorithm
• In the real-world applications of machine learning, it is very common that
there are many relevant features available for learning but only a small
subset of them are observable.
• So, for the variables which are sometimes observable and sometimes not,
then we can use the instances when that variable is visible is observed for
the purpose of learning and then predict its value in the instances when it
is not observable.

11/5/2023 67
E-M Algorithm (CO1)
• On the other hand, Expectation-Maximization algorithm can be used for the latent
variables (variables that are not directly observable and are actually inferred from
the values of the other observed variables) too in order to predict their values with
the condition that the general form of probability distribution governing those
latent variables is known to us.
• This algorithm is actually at the base of many unsupervised clustering algorithms in
the field of machine learning.
• It was explained, proposed and given its name in a paper published in 1977 by
Arthur Dempster, Nan Laird, and Donald Rubin.
• It is used to find the local maximum likelihood parameters of a statistical model in
the cases where latent variables are involved and the data is missing or incomplete.

11/5/2023 68
E-M Algorithm (CO1)
• Algorithm:
1. Given a set of incomplete data, consider a set of
starting parameters.
2. Expectation step (E – step): Using the observed
available data of the dataset, estimate (guess) the
values of the missing data.
3. Maximization step (M – step): Complete data
generated after the expectation (E) step is used in
order to update the parameters.
4. Repeat step 2 and step 3 until convergence.

11/5/2023 69
E-M Algorithm (CO1)

11/5/2023 70
E-M Algorithm (CO1)
• The essence of Expectation-Maximization algorithm is to use the available
observed data of the dataset to estimate the missing data and then using that data
to update the values of the parameters. Let us understand the EM algorithm in
detail.
• Initially, a set of initial values of the parameters are considered. A set of
incomplete observed data is given to the system with the assumption that the
observed data comes from a specific model.
• The next step is known as “Expectation” – step or E-step. In this step, we use the
observed data in order to estimate or guess the values of the missing or
incomplete data. It is basically used to update the variables.
• The next step is known as “Maximization”-step or M-step. In this step, we use the
complete data generated in the preceding “Expectation” – step in order to update
the values of the parameters. It is basically used to update the hypothesis.
• Now, in the fourth step, it is checked whether the values are converging or not, if
yes, then stop otherwise repeat step-2 and step-3 i.e. “Expectation” – step and
“Maximization” – step until the convergence occurs.

11/5/2023 71
E-M Algorithm (CO1)

11/5/2023 72
E-M Algorithm (CO1)
• Usage of EM algorithm –
• It can be used to fill the missing data in a sample.
• It can be used as the basis of unsupervised learning of
clusters.
• It can be used for the purpose of estimating the parameters of
Hidden Markov Model (HMM).
• It can be used for discovering the values of latent variables.

11/5/2023 73
E-M Algorithm (CO1)
Advantages of EM algorithm –
• It is always guaranteed that likelihood will increase with each iteration.
• The E-step and M-step are often pretty easy for many problems in terms
of implementation.
• Solutions to the M-steps often exist in the closed form.
Disadvantages of EM algorithm –
• It has slow convergence.
• It makes convergence to the local optima only.
• It requires both the probabilities, forward and backward (numerical
optimization requires only forward probability).

11/5/2023 74
Ensembles methods(CO4)
• Ensemble learning is a machine learning paradigm where multiple models
(often called “weak learners”) are trained to solve the same problem and
combined to get better results.
• The main hypothesis is that when weak models are correctly combined we
can obtain more accurate and/or robust models.
• In machine learning, no matter if we are facing a classification or a
regression problem, the choice of the model is extremely important to
have any chance to obtain good results.
• This choice can depend on many variables of the problem: quantity of
data, dimensionality of the space, distribution hypothesis…

11/5/2023 75

11/5/2023 76
• In ensemble learning theory, we call weak learners (or base models) models
that can be used as building blocks for designing more complex models by
combining several of them.
• Most of the time, these basics models perform not so well by themselves
either because they have a high bias (low degree of freedom models, for
example) or because they have too much variance to be robust (high degree of
freedom models, for example).
• Then, the idea of ensemble methods is to try reducing bias and/or variance of
such weak learners by combining several of them together in order to create a
strong learner (or ensemble model) that achieves better performances.

11/5/2023 77
• Definition: — Ensemble learning is a machine learning paradigm where
multiple models (often called “weak learners”) are trained to solve the
same problem and combined to get better results. The main hypothesis is
that when weak models are correctly combined, we can obtain more
accurate and/or robust models.
• Weak Learners: A ‘weak learner’ is any ML algorithm (for
regression/classification) that provides an accuracy slightly better than
random guessing.

11/5/2023 78
• In ensemble learning theory, we call weak learners (or base models)
models that can be used as building blocks for designing more complex
models by combining several of them.
• Most of the time, these basics models perform not so well by themselves
either because they have a high bias or because they have too much
variance to be robust.
• Then, the idea of ensemble methods is to try reducing bias and/or variance
of such weak learners by combining several of them together to create a
strong learner (or ensemble model) that achieves better performances.

11/5/2023 79
1. BAGGING
• Bagging stands for Bootstrap Aggregation.
• In real-life scenarios, we don’t have multiple different training
sets on which we can train our model separately and at the
end combine their result. Here, bootstrapping comes into the
picture.
• Bootstrapping is a technique of sampling different sets of data
from a given training set by using replacement. After
bootstrapping the training dataset, we train the model on all
the different sets and aggregate the result. This technique is
known as Bootstrap Aggregation or Bagging.

11/5/2023 80
• Definition: — Bagging is the type of ensemble technique in which a single
training algorithm is used on different subsets of the training data where
the subset sampling is done with replacement (bootstrap). Once the
algorithm is trained on all the subsets, then bagging predicts by
aggregating all the predictions made by the algorithm on different
subsets.
• For aggregating the outputs of base learners, bagging uses majority voting
(most frequent prediction among all predictions) for classification and
averaging (mean of all the predictions) for regression.

11/5/2023 81

11/5/2023 82
Advantages of a Bagging Model:
1. Bagging significantly decreases the variance without
increasing bias.
2. Bagging methods work so well because of diversity in the
training data since the sampling is done by bootstrapping.
3. Also, if the training set is very huge, it can save computational
time by training the model on a relatively smaller data set and
still can increase the accuracy of the model.
4. Works well with small datasets as well.

11/5/2023 83
Disadvantages of a Bagging Model:
1. The main disadvantage of Bagging is that it improves the
accuracy of the model at the expense of interpretability i.e., if a
single tree was being used as the base model, then it would have
a more attractive and easily interpretable diagram, but with the
use of bagging this interpretability gets lost.
2. Another disadvantage of Bootstrap Aggregation is that during
sampling, we cannot interpret which features are being selected
i.e., there are chances that some features are never used, which
may result in a loss of important information.

11/5/2023 84
• Out of Bag Evaluation: -In bagging, when different
samples are collected, no sample contains all the
data but a fraction of the original dataset. There
might be some data that are never sampled at all.
The remaining data which are not sampled are called
out of bag instances.
• The Random Forest approach is a bagging method
where deep trees (Decision Trees), fitted on
bootstrap samples, are combined to produce an
output with lower variance.

11/5/2023 85
2.BOOSTING
• Boosting models fall inside this family of ensemble methods.
• Boosting, initially named Hypothesis Boosting, consists of the idea of
filtering or weighting the data that is used to train our team of weak
learners, so that each new learner gives more weight or is only trained
with observations that have been poorly classified by the previous
learners..
• By doing this our team of models learns to make accurate predictions on
all kinds of data, not just on the most common or easy observations. Also,
if one of the individual models is very bad at making predictions on some
kind of observation, it does not matter, as the other N-1 models will most
likely make up for it.

11/5/2023 86
• Definition: — The term ‘Boosting’ refers to a family of algorithms which
converts weak learner to strong learners. Boosting is an ensemble method
for improving the model predictions of any given learning algorithm. The
idea of boosting is to train weak learners sequentially, each trying to
correct its predecessor. The weak learners are sequentially corrected by
their predecessors and, in the process, they are converted into strong
learners.

11/5/2023 87

11/5/2023 88
• Also, in boosting, the data set is weighted (represented by the different
sizes of the data points), so that observations that were incorrectly
classified by classifier n are given more importance in the training of
model n + 1, while in bagging the training samples are taken randomly
from the whole population.
• While in bagging the weak learners are trained in parallel using
randomness, in boosting the learners are trained sequentially, such that
each subsequent learner aims to reduce the errors of the previous
learners.
• Boosting, like bagging, can be used for regression as well as for
classification problems.
• Boosting is mainly focused on reducing bias.

11/5/2023 89
• Pro’s
• Computational scalability,
• Handles missing values,
• Robust to outliers,
• Does not require feature scaling,
• Can deal with irrelevant inputs,
• Interpretable (if small),
• Handles mixed predictors as well (quantitative and
qualitative)

11/5/2023 90
• Disadvantages of a Boosting Model:
1. A disadvantage of boosting is that it is sensitive to outliers
since every classifier is obliged to fix the errors in the
predecessors. Thus, the method is too dependent on outliers.
2. Another disadvantage is that the method is almost impossible
to scale up. This is because every estimator bases its correctness
on the previous predictors, thus making the procedure difficult
to streamline.

11/5/2023 91
Daily Quiz
•What mathematical concept Naive Bayes is based on?
•What are the different types of Naive Bayes classifiers?
•Is Naive Bias a classification algorithm or regression
algorithm?
•What are some benefits of Naive Bayes?
•What are the cons of Naive Bayes classifier?

Daily Quiz
11/5/2023 92
Gaurav Kumar RCS080 and ML Unit 1
• What is Naive Bayes?
• How does Naive Bayes work?
• What mathematical concept Naive Bayes is
based on?
• What are the different types of Naive Bayes
classifiers?
• Is Naive Bias a classification algorithm or
regression algorithm?
• What are some benefits of Naive Bayes?

Glossary Questions
11/5/2023 93
1. How many terms are required for building a bayes model?
a) 1
b) 2
c) 3
d) 4
2. What is needed to make probabilistic systems feasible in the world?
a) Reliability
b) Crucial robustness
c) Feasibility
d) None of the mentioned

Glossary Questions
11/5/2023 94
3. Where does the bayes rule can be used?
a) Solving queries
b) Increasing complexity
c) Decreasing complexity
d) Answering probabilistic query
4. What does the bayesian network provides?
a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
d) None of the mentioned

MCQ
11/5/2023 95
Question 1 :
Naive Baye is?
Options :
a. Conditional Independence
b. Conditional Dependence
c. Both a and b
d. None of the above
Question 2 :
Naive Bayes requires?
Options :
a. Categorical Values
b. Numerical Values
c. Either a or b
d. Both a and b

MCQ
11/5/2023 96
Question 3 :
Probabilistic Model of data within each class is?
Options :
a. Discriminative classification
b. Generative classification
c. Probabilistic classification
d. Both b and c
Question 4 :
A Classification rule is said?
Options :
a. Discriminative classification
b. Generative classification
c. Probabilistic classification
d. Both a and c

MCQ
11/5/2023 97
Question 5 :
Spam Classification is an example for ?
Options :
a. Naive Bayes
b. Probabilistic condition
c. Random Forest
d. All the
Above
Question 6 :
Time complexity for Naive Bayes classifier for n feature, L classdata
is
Options :
a. n*L
b . O(n+L)
c. O(n*L)
d. O(n/L)

MCQ
11/5/2023 98
Question 7 :
Naive Bayes pays attention to complex interactions and
Options :
a. Local Structure
b. Statistical Model
c. Both a and b
d. none of
these
Question 8 :
A list of symptoms, predict whether a patient has diseaseX or not
Options :
a. Medical Diagnosis
b. Weather Diagnosis
c. Spam Diagnosis
d. All the Above

MCQ
11/5/2023 99
Question 9 :
In Na+ve Bayes Numerical variable must be binned and converted to ?
Options :
a. Categorical Values
b. Numerical Values
c. Either a or b
d. Both a and b
Question 10 :
In Exact Bayes calculation is limited to the two firms matching with
there?
Options :
a. Diagnosis value
b. Probabilistic condition
c. Characteristics
d. Noneof the above

Faculty Video Links, Youtube & NPTEL Video Links and Online
Courses Details
Youtube video-
•https://www.youtube.com/watch?v=PDYfCkLY_DE
•https://www.youtube.com/watch?v=ncOirIPHTOw
•https://www.youtube.com/watch?v=cW03t3aZkmE
11/5/2023 100

Weekly Assignment
Assignment 1
11/5/2023 101
• What are the cons of Naive Bayes classifier?
• What are the applications of Naive Bayes?
• Is Naive Bayes is a discriminative classifier or generative classifier?
• What is the formula given by Bayes theorem?
• What is posterior probability and prior probability in Naïve Bayes?
• Define likelihood and evidence in Naive Bayes?
• Define Bayes theorem in terms of prior, evidence and likelihood.
• While calculating the probability of a given situation, what error can we
run into in Naïve Bayes and how can we solve it?

Old Question Papers
11/5/2023 102
Note: No old question paper available for this subject. Introduced
first time.
I have added expected question for university exam in next slide.

1. Explain the introduction to Bayesian Statistics And Bayes
Theorem?
2. Explain The Bayes’ Box
3. Which Is Better Bayesian Or Frequentist Statistics?
4. How Bayesian Statistics Is Related To Machine Learning?
5. Explain Naive Bayes Classifier
6. Explain The Strength Of Bayesian Statistics
7. Do You Think That Bayesian Statistics Has The Power To Replace
Frequentists?
8. Explain The Difference Between Maximum Likelihood Estimation
(MLE) And Bayesian Statistics
9. What Are Some Unique Applications Of Bayesian Statistics And
Bayes Theorem?
10. Why Bayesian Statistics Is Important?
Expected Questions for University Exam

References
Text books:
1. Tom M. Mitchell, ―Machine Learning, McGraw-Hill Education
(India) Private Limited, 2013.
2. Ethem Alpaydin, ―Introduction to Machine Learning (Adaptive
Computation and Machine Learning), The MIT Press 2004.
3. Stephen Marsland, ―Machine Learning: An Algorithmic
Perspective, CRC Press, 2009.
4. Bishop, C., Pattern Recognition and Machine Learning. Berlin:
Springer-Verlag.
11/5/2023 104

Recap of Unit
11/5/2023 105
Naive Bayes algorithms are mostly used in sentiment analysis, spam
filtering, recommendation systems etc. They are fast and easy to
implement but their biggest disadvantage is that the requirement of
predictors to be independent. In most of the real life cases, the
predictors are dependent, this hinders the performance of the
classifier.

CONTENT
Thank you
11/5/2023 106
INTRODUCTION

Unit4 updated ML fa djg aklj gaabzag.pdf

Recommended

Recommended

More Related Content

Similar to Unit4 updated ML fa djg aklj gaabzag.pdf

Similar to Unit4 updated ML fa djg aklj gaabzag.pdf (20)

Recently uploaded

Recently uploaded (20)

Unit4 updated ML fa djg aklj gaabzag.pdf