Regression, Bayesian Learning and Support vector machine

Machine Learning Techniques
Regression
Bayesian Learning
Support Vector Machine
Dr. Radhey Shyam
Professor
Dept. of Computer Science & Engg., BIET Lucknow
Following slides have been prepared by Radhey Shyam, with grateful acknowledgement of others who made their course
contents freely available. Feel free to reuse these slides for your own academic purposes. Please send feedback at
shyam0058@gmail.com
Dr. Radhey Shyam (BIET Lucknow-AKTU) Machine Learning Techniques September 29, 2020 1 / 64

OutlineOutline
1 Regression
Simple Linear Regression
Logistic Regression
2 Bayesian Learning
Bayes Theorem
Concept Learning
Bayes Optimal Classiﬁer
NAIVE BAYES CLASSIFIER
BAYESIAN BELIEF NETWORKS
EM Algorithm
3 Support Vector Machine
4 Conclusion
5 References

RegressionRegression
Linear Regression— In statistics, linear regression is a linear approach to
modeling the relationship between a scalar response (or dependent
variable) and one or more explanatory variables (or independent variables).
The case of one explanatory variable is called simple linear regression.
Linear regression is used to predict the continuous dependent variable
using a given set of independent variables.
Linear Regression is used for solving Regression problem.
In Linear regression, value of continuous variables are predicted.
Linear regression tried to ﬁnd the best ﬁt line, through which the
output can be easily predicted.
Least square estimation method1 is used for estimation of accuracy2.
The output for Linear Regression must be a continuous value, such as
price, age, etc.

RegressionRegression (cont.)
In Linear regression, it is required that relationship between
dependent variable and independent variable must be linear.
In linear regression, there may be collinearity3 between the
independent variables.
Some Regression examples:
Regression analysis is used in stats to ﬁnd trends in data. For
example, you might guess that there is a connection between how
much you eat and how much you weigh; regression analysis can help
you quantify that.
Regression analysis will provide you with an equation for a graph so
that you can make predictions about your data. For example, if
you’ve been putting on weight over the last few years, it can predict
how much you’ll weigh in ten years time if you continue to put on
weight at the same rate.

It is also called simple linear regression. It establishes the relationship
between two variables using a straight line. If two or more
explanatory variables have a linear relationship with the dependent
variable, the regression is called a multiple linear regression.

Logistic Regression— use to resolve classification problems where given
an element you have to classify the same in N categories. Typical
examples are for example given a mail to classify it as spam or not, or
given a vehicle find to which category it belongs (car, truck, van, etc.).
That’s basically the output is a finite set of descrete values.
Logistic Regression is used to predict the categorical dependent
variable using a given set of independent variables.
Logistic regression is used for solving Classification problems.
In logistic Regression, we predict the values of categorical variables.
In Logistic Regression, we find the S-curve by which we can classify
the samples.
Maximum likelihood estimation method is used for estimation of
accuracy.
The output of Logistic Regression must be a Categorical value such
as 0 or 1, Yes or No, etc.

In Logistic regression, it is not required to have the linear relationship
between the dependent and independent variable.
In logistic regression, there should not be collinearity between the
independent variable.
1
The least squares method is a statistical procedure to find the best fit for a set of
data points by minimizing the sum of the offsets of points from the plotted curve. Least
squares regression is used to predict the behavior of dependent variables.
2
Accuracy is how close a measured value is to the actual value. Precision is how
close the measured values are to each other.
3
Collinearity is a condition in which some of the independent variables are highly
correlated.

Bayesian LearningBayesian Learning
Bayesian Decision Theory came long before Version Spaces, Decision
Tree Learning and Neural Networks. It was studied in the field of
Statistical Theory and more specifically, in the field of Pattern
Recognition.
Bayesian Decision Theory is at the basis of important learning
schemes such as the Na¨ıve Bayes Classifier, Learning Bayesian Belief
Networks and the EM Algorithm.
Bayesian Decision Theory is also useful as it provides a framework
within which many non-Bayesian classifiers can be studied.
Bayes Theorem
Goal — To determine the most probable hypothesis, given the data
D plus any initial knowledge about the prior probabilities of the
various hypotheses in H.

Bayesian LearningBayesian Learning (cont.)
Prior probability of h, P(h) — it reflects any background
knowledge we have about the chance that h is a correct hypothesis
(before having observed the data).
Prior probability of D, P(D) — it reflects the probability that
training data D will be observed given no knowledge about which
hypothesis h holds.
Conditional Probability of observation D, P(D|h) — it denotes
the probability of observing data D given some world in which
hypothesis h holds.
Posterior probability of h, P(h|D) — it represents the probability
that h holds given the observed training data D. It reflects our
confidence that h holds after we have seen the training data D and it
is the quantity that Machine Learning researchers are interested in.

Bayes Theorem allows us to compute P(h|D) —
P(h|D) = P(D|h)P(h)/P(D)
Maximum A Posteriori (MAP) Hypothesis and Maximum Likelihood
Goal — To ﬁnd the most probable hypothesis h from a set of
candidate hypotheses H given the observed data D. MAP
Hypothesis,
hMAP = argmax
h∈H
P(h|D)
= argmax
h∈H
P(D|h)P(h)/P(D)
= argmax
h∈H
P(D|h)P(h)

If every hypothesis in H is equally probable a priori, we only need to
consider the likelihood of the data D given h, P(D|h). Then, hMAP
becomes the Maximum Likelihood,
hML = argmax
h∈H
P(D|h)P(h)
Concept Learning
Inducing general functions from speciﬁc training examples is a main
issue of machine learning.
Concept Learning: Acquiring the deﬁnition of a general category from
given sample positive and negative training examples of the category.

Concept Learning can seen as a problem of searching through a
predefined space of potential hypotheses for the hypothesis that best
fits the training examples.
The hypothesis space has a general-to-specific ordering of hypotheses,
and the search can be efficiently organized by taking advantage of a
naturally occurring structure over the hypothesis space.
A Formal Definition for Concept Learning:
Inferring a boolean-valued function from training examples of
its input and output.
An example for concept-learning is the learning of bird-concept from
the given examples of birds (positive examples) and non-birds
(negative examples).
We are trying to learn the definition of a concept from given examples.

A set of example days, and each is described by six attributes.

The task is to learn to predict the value of EnjoySport for arbitrary
day, based on the values of its attribute values.
EnjoySport – Hypothesis Representation
Each hypothesis consists of a conjuction of constraints on the
instance attributes.
Each hypothesis will be a vector of six constraints, specifying the
values of the six attributes
– (Sky, AirTemp, Humidity, Wind, Water, and Forecast).
Each attribute will be:
? - indicating any value is acceptable for the attribute (don’t care)
singlevalue – specifying a single required value (ex. Warm)
(speciﬁc)
φ - indicating no value is acceptable for the attribute (no value)

Hypothesis Representation
A hypothesis:
Sky AirTemp Humidity Wind Water Forecast
< Sunny, ?, ?, Strong, ?, Same >
The most general hypothesis – that every day is a positive example
<?, ?, ?, ?, ?, ? >
The most speciﬁc hypothesis – that no day is a positive example
< φ, φ, φ, φ, φ, φ >
EnjoySport concept learning task requires learning the sets of days
for which EnjoySport = yes, describing this set by a conjunction of
constraints over the instance attributes.

EnjoySport Concept Learning Task
Given
Instances X : set of all possible days, each described by the attributes
Sky – (values: Sunny, Cloudy, Rainy)
AirTemp – (values: Warm, Cold)
Humidity – (values: Normal, High)
Wind – (values: Strong, Weak)
Water – (values: Warm, Cold)
Forecast – (values: Same, Change)
TargetConcept(Function) c : EnjoySport : X → {0, 1}
Hypotheses H : Each hypothesis is described by a conjunction of
constraints on the attributes.
TrainingExamples D: positive and negative examples of the target
function
Determine
A hypothesis h in H such that h(x) = c(x) for all x in D.

The Inductive Learning Hypothesis
Although the learning task is to determine a hypothesis h identical to
the target concept cover the entire set of instances X, the only
information available about c is its value over the training examples.
Inductive learning algorithms can at best guarantee that the
output hypothesis fits the target concept over the training data.
Lacking any further information, our assumption is that the best
hypothesis regarding unseen instances is the hypothesis that best fits
the observed training data. This is the fundamental assumption of
inductive learning.
The Inductive Learning Hypothesis - Any hypothesis found to
approximate the target function well over a sufficiently large set of
training examples will also approximate the target function well over
other unobserved examples.

Concept Learning as Search
Concept learning can be viewed as the task of searching through a
large space of hypotheses implicitly defined by the hypothesis
representation.
The goal of this search is to find the hypothesis that best fits the
training examples.
By selecting a hypothesis representation, the designer of the learning
algorithm implicitly defines the space of all hypotheses that the
program can ever represent and therefore can ever learn.

Bayes Optimal Classifier
We have considered the question so far “what is the most probable
hypothesis given the training data?”
In fact, the question that is often of most significance is the closely
related question “what is the most probable classification of the new
instance given the training data?”
Although it may seem that this second question can be answered by
simply applying the MAP hypothesis to the new instance, in fact it is
possible to do better.
To develop some intuitions consider a hypothesis space containing
three hypotheses, h1, h2, and h3.
Suppose that the posterior probabilities of these hypotheses given the
training data are .4, .3, and .3 respectively.

Thus, h1 is the MAP hypothesis.
Suppose a new instance x is encountered, which is classified positive
by h1 , but negative by h2 and h3.
Taking all hypotheses into account, the probability that x is positive
is .4 (the probability associated with h1), and the probability that it is
negative is therefore .6. The most probable classification (negative) in
this case is different from the classification generated by the MAP
hypothesis.
In general, the most probable classification of the new instance is
obtained by combining the predictions of all hypotheses, weighted by
their posterior probabilities.
If the possible classification of the new example can take on any value
vj from some set V , then the probability P(vj|D) that the correct
classification for the new instance is v;, is just P(vj|D).

The optimal classification of the new instance is the value vj, for
which P(vj|D) is maximum.
P(Vj|D) =
hi∈H
P(vj|hi)P(hi|D)
The optimal classification of the new instance is the value vj, for
which P(vj|D) is maximum.
P(Vj|D) = argmax
vj∈V hi∈H
P(vj|hi)P(hi|D) (1)
Any system that classifies new instances according to Equation (1) is
called a Bayes optimal classifier, or Bayes optimal learner. No
other classification method using the same hypothesis space and same
prior knowledge can outperform.

NAIVE BAYES CLASSIFIER
One highly practical Bayesian learning method is the naive Bayes
learner, often called the naive Bayes classifier.
In some domains its performance has been shown to be comparable
to that of neural network and decision tree learning.
This section introduces the naive Bayes classifier; the next section
applies it to the practical problem of learning to classify natural
language text documents.
The naive Bayes classifier applies to learning tasks where each
instance x is described by a conjunction of attribute values and where
the target function f(x) can take on any value from some finite set
V . A set of training examples of the target function is provided, and
a new instance is presented, described by the tuple of attribute values
< a1, a2...an >. The learner is asked to predict the target value, or
classification, for this new instance.

The Bayesian approach to classifying the new instance is to assign the
most probable target value, vMAP , given the attribute values
< a1, a2...an > that describe the instance.
vMAP = argmax
vj∈V
P(vj|a1, a2...an)
We can use Bayes theorem to rewrite this expression as
vMAP = argmax
vj∈V
P(a1, a2...an|vj)P(vj)
P(a1, a2...an)
(2)
= argmax
vj∈V
P(a1, a2...an|vj)P(vj) (3)
Now we could attempt to estimate the two terms in Equation (3)
based on the training data.

It is easy to estimate each of the P(vj) simply by counting the
frequency with which each target value vj occurs in the training data.
However, estimating the diﬀerent P(a1, a2...an|vj) terms in this
fashion is not feasible unless we have a very large set of training data.
The problem is that the number of these terms is equal to the number
of possible instances times the number of possible target values.
Therefore, we need to see every instance in the instance space many
times in order to obtain reliable estimates.
The naive Bayes classiﬁer is based on the simplifying assumption that
the attribute values are conditionally independent given the target
value.
In other words, the assumption is that given the target value of the
instance, the probability of observing the conjunction a1, a2...an is
just the product of the probabilities for the individual attributes:
P(a1, a2...an|vj) = i P(ai|vj). Substituting this into Equation (3).

we have the approach used by the naive Bayes classiﬁer.
vNB = argmax
vj∈V
P(vj)
i
P(ai|vj) (4)
where vNB denotes the target value output by the naive Bayes
classiﬁer.

BAYESIAN BELIEF
NETWORKS
Abbreviation : BBN (Bayesian Belief Network)
Synonyms: Bayes (ian) network, Bayes(ian) model, Belief network,
Decision network, or probabilistic directed acyclic graphical model.
A BBN is a probabilistic graphical model that represents a set of
variables and their conditional dependencies via a Directed Acyclic
Graph (DAG).

BBNs enable us to model and reason about uncertainty. BBNs
accommodate both subjective probabilities and probabilities based on
objective data.
The most important use of BBNs is in revising probabilities in the
light of actual observations of events.
Nodes represent variables in the Bayesian sense: observable
quantities, hidden variables or hypotheses. Edges represent
conditional dependencies.
Each node is associated with a probability function that takes, as
input, a particular set of probabilities for values for the node’s parent
variables, and outputs the probability of the values of the variable
represented by the node.
Prior Probabilities: e.g. P(RAIN)
Conditional Probabilities: e.g. P(SPRINKLER | RAIN)

Joint Probability Function: P(GRASS WET, SPRINKLER, RAIN) =
P(GRASS WET | RAIN, SPRINKLER) * P(SPRINKLER | RAIN) * P
( RAIN)

Typically the probability functions are described in table form.

EM Algorithm
The EM algorithm was explained and given its name in a classic
1977 paper by Arthur Dempster, Nan Laird, and Donald Rubin.
• EM is typically used to compute maximum likelihood estimates
given incomplete samples.
• The EM algorithm estimates the parameters of a model
iteratively.
– Starting from some initial guess, each iteration consists of
 an E step (Expectation step)
 an M step (Maximization step)

600.465 - Intro to NLP - J. Eisner 4
Guess of
unknown
parameters
(probabilities)
initial
guess
M step
Observed
structure
(words, ice cream)
General Idea
Guess of unknown
hidden structure
(tags, parses, weather)
E step

A general Statement

Consider a sample (X1
...Xn
) which is drawn from a probability distribution P(X|A) where A are
parameters. If the Xs are independent with probability density function P(Xi
|A) the joint
probability of the whole set is

A)|XP(=A)|X...XP( i
n
=1i
n1 

this may be maximised with respect to A to give the maximum likelihood estimates.
MLE(Maximum likelihood estimation)

• Given
– A sample X={X1, …, Xn}
– A vector of parameters θ
• We define
– Likelihood of the data: P(X | θ)
– Log-likelihood of the data: L(θ)=log P(X|θ)
• Given X, find
)(maxarg 

LML


MLE(Maximum likelihood estimation)

Basic setting in EM
• X is a set of data points: observed data
• Θ is a parameter vector.
• EM is a method to find θML where
• Calculating P(X | θ) directly is hard.
• Calculating P(X,Y|θ) is much simpler, where Y is “hidden” data (or
“missing” data).
Θ)|P(X
L(Θ(=θ
Ωθ
ΩθML
log
xam
gra
xam
gra



The basic EM strategy
• Z = (X, Y)
– Z: complete data (“augmented data”)
– X: observed data (“incomplete” data)
– Y: hidden data (“missing” data)

The “missing” data Y
• Y need not necessarily be missing in the practical sense of the
word.
• It may just be a conceptually convenient technical device to
simplify the calculation of P(x |θ).
• There could be many possible Ys.

The EM algorithm
• Start with initial estimate, θ0
• Repeat until convergence
– E-step: calculate
– M-step: find
),(maxarg)1( tt
Q 


θ)|y,P(x)θ,x|P(y=)θQ(θ( i
t
n
=i y
i
t
log
1


The Q-function
The Q-function is the expected value of the complete data log-likelihood P(X,Y|
θ) with respect to Y given X and θt
.
& where,
– Y is a random vector.
– X=(x1, x2, …, xn) is a constant (vector).
– Θt
is the current parameter estimate and is a constant (vector).
– Θ is the normal variable (vector) that we wish to adjust.

Expectation Maximization EM
When to use
 data is only partially observable
 unsupervised clustering: target value unobservable
 supervised learning: some instance attributes unobservable
Applications
 training Bayesian Belief Networks
 unsupervised clustering
 learning hidden Markov models

SUPPORT VECTOR MACHINESUPPORT VECTOR MACHINE
History of SVM
SVM is related to statistical learning theory.
SVM was ﬁrst introduced in 1992.
SVM becomes popular because of its success in handwritten digit
recognition 1.1% test error rate for SVM. This is the same as the
error rates of a carefully constructed neural network.
SVM is now regarded as an important example of “kernel methods”,
one of the key area in machine learning

SUPPORT VECTOR MACHINESUPPORT VECTOR MACHINE (cont.)
Binary Classification
Given training data (xi, yi) for i = 1 . . . N, with
xi ∈ Rd and yi ∈ {−1, 1}, learn a classiﬁer f(x)
such that
f(xi)
(
≥ 0 yi = +1
< 0 yi = −1
i.e. yif(xi) > 0 for a correct classiﬁcation.

Linear separability
linearly
separable
not
linearly
separable

Linear classifiers
X2
X1
A linear classifier has the form
• in 2D the discriminant is a line
• is the normal to the line, and b the bias
• is known as the weight vector
f(x) = 0
f(x) = w>x + b
f(x) > 0f(x) < 0

Linear classifiers
A linear classifier has the form
• in 3D the discriminant is a plane, and in nD it is a hyperplane
For a K-NN classifier it was necessary to `carry’ the training data
For a linear classifier, the training data is used to learn w and then discarded
Only w is needed for classifying new data
f(x) = 0
f(x) = w>x + b

Given linearly separable data xi labelled into two categories yi = {-1,1} ,
find a weight vector w such that the discriminant function
separates the categories for i = 1, .., N
• how can we find this separating hyperplane ?
The Perceptron Classifier
f(xi) = w>xi + b
The Perceptron Algorithm
Write classifier as
• Initialize w = 0
• Cycle though the data points { xi, yi }
• if xi is misclassified then
• Until all the data is correctly classified
w ← w + α sign(f(xi)) xi
f(xi) = ˜w>˜xi + w0 = w>xi
where w = (˜w, w0), xi = (˜xi, 1)

For example in 2D
X2
X1
X2
X1
w
before update after update
w
NB after convergence w =
PN
i αixi
• Initialize w = 0
• Cycle though the data points { xi, yi }
• if xi is misclassified then
• Until all the data is correctly classified
w ← w + α sign(f(xi)) xi
xi

• if the data is linearly separable, then the algorithm will converge
• convergence can be slow …
• separating line close to training data
• we would prefer a larger margin for generalization
-15 -10 -5 0 5 10
-10
-8
-6
-4
-2
0
2
4
6
8
Perceptron
example

What is the best w?
• maximum margin solution: most stable under perturbations of the inputs

Tennis example
Humidity
Temperature
= play tennis
= do not play tennis

Linear Support Vector
Machines
x1
x2
=+1
=-1
Data: <xi,yi>, i=1,..,l
xi  Rd
yi  {-1,+1}

=-1
=+1
Data: <xi,yi>, i=1,..,l
xi  Rd
yi  {-1,+1}
All hyperplanes in Rd
are parameterize by a vector (w) and a constant b.
Can be expressed as w•x+b=0 (remember the equation for a hyperplane
from algebra!)
Our aim is to find such a hyperplane f(x)=sign(w•x+b), that
correctly classify our data.
f(x)
Linear SVM 2

d+
d-
Definitions
Define the hyperplane H such that:
xi•w+b  +1 when yi =+1
xi•w+b  -1 when yi =-1
d+ = the shortest distance to the closest positive point
d- = the shortest distance to the closest negative point
The margin of a separating hyperplane is d+
+ d-
.
H
H1 and H2 are the planes:
H1: xi•w+b = +1
H2: xi•w+b = -1
The points on the planes
H1 and H2 are the
Support Vectors
H1
H2

Maximizing the margin
d+
d-
We want a classifier with as big margin as possible.
Recall the distance from a point(x0,y0) to a line:
Ax+By+c = 0 is|A x0 +B y0 +c|/sqrt(A2
+B2
)
The distance between H and H1 is:
|w•x+b|/||w||=1/||w||
The distance between H1 and H2 is: 2/||w||
In order to maximize the margin, we need to minimize ||w||. With the
condition that there are no datapoints between H1 and H2:
xi•w+b  +1 when yi =+1
xi•w+b  -1 when yi =-1 Can be combined into yi(xi•w)  1
H1
H2
H

Constrained Optimization
Problem
  
0and0subject to
2
1
Maximize
:yieldsgsimplifyinand,intobackngsubstituti0,to
themsettings,derivativetheTaking0.bemustandboth
respectwithofderivativepartialtheextremum,At the
1)(||||
2
1
),,(
where,),,(infmaximize:methodLagrangian
allfor1)(subject to||||Minimize
,





 

i
i
ii
i ji
jijijii
i
iii
ii
y
yy
L
b
L
bybL
bL
iby




xx
w
wxww
w
wxwww
w

Quadratic Programming
• Why is this reformulation a good thing?
• The problem
is an instance of what is called a positive, semi-definite
programming problem
• For a fixed real-number accuracy, can be solved in
O(n log n) time = O(|D|2 log |D|2)
0and0subject to
2
1
Maximize
,



 
i
i
ii
i ji
jijijii
y
yy

 xx

Problems with linear SVM
=-1
=+1
What if the decision function is not a linear?

Kernel Trick
)2,,(spacein the
separablelinearlyarepointsData
21
2
2
2
1 xxxx
2
,
),(
Here,directly!computeeasy tooftenis:thingCool
)()(),(Define
)()(
2
1
maximizewant toWe
jiji
jiji
i ji
jijijii
K
K
FFK
FFyy
xxxx
xxxx
xx


  

Other Kernels
The polynomial kernel
K(xi,xj) = (xi•xj + 1)p
, where p is a tunable parameter.
Evaluating K only require one addition and one exponentiation
more than the original dot product.
Gaussian kernels (also called radius basis functions)
K(xi,xj) = exp(||xi-xj ||2
/22
)

Overtraining/overfitting
=-1
=+1
An example: A botanist really knowing trees.Everytime he sees a new tree,
he claims it is not a tree.
A well known problem with machine learning methods is overtraining.
This means that we have learned the training data very well, but
we can not classify unseen examples correctly.

Overtraining/overfitting 2
It can be shown that: The portion, n, of unseen data that will be
missclassified is bounded by:
n  Number of support vectors / number of training examples
A measure of the risk of overtraining with SVM (there are also other
measures).
Ockham´s razor principle: Simpler system are better than more complex ones.
In SVM case: fewer support vectors mean a simpler representation of the
hyperplane.
Example: Understanding a certain cancer if it can be described by one gene
is easier than if we have to describe it with 5000.

A practical example, protein
localization
• Proteins are synthesized in the cytosol.
• Transported into different subcellular
locations where they carry out their
functions.
• Aim: To predict in what location a
certain protein will end up!!!

ConclusionConclusion
Regression is a statistical technique that uses to establish relationship
between one dependent variable and one or more independent
variables.
Bayesian learning uses Bayes theorem to determine the conditional
probability of a hypotheses given some evidence or observations.
SVM is a useful alternative to neural networks.
Two key concepts of SVM: maximize the margin and the kernel trick.
Many SVM implementations are available on the web for you to try
on your data set!

ReferencesReferences
[1] Thomas M. Mitchell.
Machine Learning.
McGraw-Hill, Inc., USA, 1 edition, 1997.
[2] Christopher M. Bishop.
Pattern Recognition and Machine Learning.
Springer, 2006.
[3] Stephen Marsland.
Machine Learning: An Algorithmic Perspective, Second Edition.
2nd edition, 2014.
[4] Ethem Alpaydin.
Introduction to Machine Learning.
Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA, 3 edition, 2014.
[5] B E Boser, I M Guyon, and V N Vapnik.
A training algorithm for optimal margin classiﬁers.
In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, volume 5, pages 144–152,
Springer-Verlag, Berlin Heidelberg, 1992.
[6] V. Vapnik.
The nature of statistical learning theory.
In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Springer, 1999.

Thank You!

Regression, Bayesian Learning and Support vector machine

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Regression, Bayesian Learning and Support vector machine

Similar to Regression, Bayesian Learning and Support vector machine (20)

More from Dr. Radhey Shyam

More from Dr. Radhey Shyam (20)

Recently uploaded

Recently uploaded (20)

Regression, Bayesian Learning and Support vector machine