SlideShare a Scribd company logo
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
1
LECTURE 7: Kernel Density Estimation
g Non-parametric Density Estimation
g Histograms
g Parzen Windows
g Smooth Kernels
g Product Kernel Density Estimation
g The Naïve Bayes Classifier
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
2
Non-parametric density estimation
g In the previous two lectures we have assumed that either
n The likelihoods p(x|ωi) were known (Likelihood Ratio Test) or
n At least the parametric form of the likelihoods were known (Parameter
Estimation)
g The methods that will be presented in the next two lectures do not
afford such luxuries
n Instead, they attempt to estimate the density directly from the data without
making any parametric assumptions about the underlying distribution
n Sounds challenging? You bet!
x1
x
2
P(x1, x2| ωi)
NON-PARAMETRIC
DENSITY ESTIMATION
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
1
1
1
1
1
1
1
1
1
1
1
1
1
11
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
3
33
3
3
33
3
3
3
3
3
4
4
4
4
4
4
4
4
4
44
4
4
4
4
4
4
4
5
5
5
5
5
5
5
55
5
5
55
5
5
5
6
6
6
6
6 6
6
6
6
6
6
6
6
77
77
7
7
7
7
7
7
7
7
7
7
7
8
8
8
8
8
88
8
8
8
8
8
8
8
8
8
9
9
9
99
9
9
9
9
9
99
9
9
10
10
10
10
10
10
10
10
10
10
10
10
10
10
1*
1*
1*
1*
2*
2*
2*
2*
2*
2*
3*
3*
3*
3*
3*
3*
3*
4*
5*
5*
5*
5*
5*
6*
6*
6*
6*
6*
6*
6*
7*
7*
7*
7*
8*
8*
8*
9*
9*
9*
9*
9*
10*
10*
10*
10*
10*
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
1
1
1
1
1
1
1
1
1
1
1
1
1
11
2
2
2
2
2
2
2
2
2
2
2
2
22
2
2
3
33
3
3
33
3
3
3
3
3
4
4
4
4
4
4
4
4
4
44
4
4
4
4
4
4
4
5
5
5
5
5
5
5
55
5
5
55
5
5
5
6
6
6
6
6 6
6
6
6
6
6
6
6
77
77
7
7
7
7
7
7
7
7
7
7
7
8
8
8
8
8
88
8
8
8
8
8
8
8
8
8
9
9
9
99
9
9
9
9
9
99
9
9
10
10
10
10
10
10
10
10
10
10
10
10
10
10
1*
1*
1*
1*
2*
2*
2*
2*
2*
2*
3*
3*
3*
3*
3*
3*
3*
4*
5*
5*
5*
5*
5*
6*
6*
6*
6*
6*
6*
6*
7*
7*
7*
7*
8*
8*
8*
9*
9*
9*
9*
9*
10*
10*
10*
10*
10*
x1
x2
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
3
The histogram
g The simplest form of non-parametric D.E. is the familiar histogram
n Divide the sample space into a number of bins and approximate the density at the center of
each bin by the fraction of points in the training data that fall into the corresponding bin
g The histogram requires two “parameters” to be defined: bin width and starting position of the first bin
g The histogram is a very simple form of D.E., but it has several drawbacks
n The final shape of the density estimate depends on the starting position of the bins
g For multivariate data, the final shape of the density is also affected by the orientation of the bins
n The discontinuities of the estimate are not due to the underlying density, they are only an
artifact of the chosen bin locations
g These discontinuities make it very difficult, without experience, to grasp the structure of the data
n A much more serious problem is the curse of dimensionality, since the number of bins grows
exponentially with the number of dimensions
g In high dimensions we would require a very large number of examples or else most of the bins would
be empty
g All these drawbacks make the histogram unsuitable for most practical
applications except for rapid visualization of results in one or two dimensions
n Therefore, we will not spend more time looking at the histogram
[ ]
[ ]
x
containing
bin
of
width
x
as
bin
same
in
x
of
number
N
1
(x)
P
(k
H =
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
4
Non-parametric density estimation, general formulation (1)
g Let us return to the basic definition of probability to get a solid idea of
what we are trying to accomplish
n The probability that a vector x, drawn from a distribution p(x), will fall in a given
region ℜ of the sample space is
n Suppose now that N vectors {x(1, x(2, …, x(N} are drawn from the distribution. The
probability that k of these N vectors fall in ℜ is given by the binomial distribution
n It can be shown (from the properties of the binomial p.m.f.) that the mean and
variance of the ratio k/N are
n Therefore, as N→∞, the distribution becomes sharper (the variance gets
smaller) so we can expect that a good estimate of the probability P can be
obtained from the mean fraction of the points that fall within ℜ
∫
ℜ
= )dx'
p(x'
P
( ) k
N
k
P)
(1
P
k
N
k
P −
−








=
( )
N
P
1
P
P
N
k
E
N
k
Var
and
P
N
k
E
2
−
=














−
=






=






N
k
P ≅
From [Bishop, 1995]
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
5
Non-parametric density estimation, general formulation (2)
n On the other hand, if we assume that ℜ is so small that p(x) does not vary
appreciably within it, then
g where V is the volume enclosed by region ℜ
n Merging with the previous result we obtain
n This estimate becomes more accurate as we increase the number of sample
points N and shrink the volume V
g In practice the value of N (the total number of examples) is fixed
n In order to improve the accuracy of the estimate p(x) we could let V approach
zero but then the region ℜ would then become so small that it would enclose no
examples
n This means that, in practice, we will have to find a compromise value for the
volume V
g Large enough to include enough examples within ℜ
g Small enough to support the assumption that p(x) is constant within ℜ
NV
k
p(x)
N
k
P
p(x)V
)dx'
p(x'
P
≅
⇒





≅
≅
= ∫
ℜ
p(x)V
)dx'
p(x' ≅
∫
ℜ
From [Bishop, 1995]
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
6
Non-parametric density estimation, general formulation (3)
g In conclusion, the general expression for non-parametric density
estimation becomes
g When applying this result to practical density estimation problems,
two basic approaches can be adopted
n We can choose a fixed value of the volume V and determine k from the data.
This leads to methods commonly referred to as Kernel Density Estimation
(KDE), which are the subject of this lecture
n We can choose a fixed value of k and determine the corresponding volume V
from the data. This gives rise to the k Nearest Neighbor (kNN) approach, which
will be covered in the next lecture
g It can be shown that both kNN and KDE converge to the true
probability density as N→∞, provided that V shrinks with N, and k
grows with N appropriately





≅
V
inside
examples
of
number
the
is
k
examples
of
number
total
the
is
N
x
g
surroundin
volume
the
is
V
where
NV
k
p(x)
From [Bishop, 1995]
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
7
Parzen windows (1)
g Suppose that the region ℜ that encloses the
k examples is a hypercube with sides of
length h centered at the estimation point x
n Then its volume is given by V=hD, where D is the
number of dimensions
g To find the number of examples that fall
within this region we define a kernel
function K(u)
n This kernel, which corresponds to a unit
hypercube centered at the origin, is known as a
Parzen window or the naïve estimator
n The quantity K((x-x(n)/h) is then equal to unity if
the point x(n is inside a hypercube of side h
centered on x, and zero otherwise
( )


 =
∀
<
=
otherwise
0
D
1,..,
j
1/2
u
1
u
K j
x
h
h
h
From [Bishop, 1995]
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
8
Parzen windows (2)
g The total number of points inside the
hypercube is then
g Substituting back into the expression
for the density estimate
n Notice that the Parzen window density
estimate resembles the histogram, with
the exception that the bin locations are
determined by the data points
∑
=







 −
=
N
1
n
(n
D
KDE
h
x
x
K
Nh
1
(x)
p
∑
=







 −
=
N
1
n
n
(
h
x
x
K
k
Volume
1 / V
K(x-x(1)=1
K(x-x(2)=1
K(x-x(3)=1
K(x-x(4)=0
x(1
x(2
x(3
x(4
x
x(1
x(2
x(3
x(4
Volume
1 / V
K(x-x(1)=1
K(x-x(1)=1
K(x-x(2)=1
K(x-x(2)=1
K(x-x(3)=1
K(x-x(3)=1
K(x-x(4)=0
K(x-x(4)=0
x(1
x(2
x(3
x(4
x
x(1
x(2
x(3
x(4
From [Bishop, 1995]
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
9
Parzen windows (3)
g To understand the role of the kernel function we compute the
expectation of the probability estimate p(x)
n where we have assumed that the vectors x(n are drawn independently from the
true density p(x)
g We can see that the expectation of the estimated density pKDE(x) is a
convolution of the true density p(x) with the kernel function
n The width h of the kernel plays the role of a smoothing parameter: the wider the
kernel function, the smoother the estimate pKDE(x)
g For h→0, the kernel approaches a Dirac delta function and pKDE(x)
approaches the true density
n However, in practice we have a finite number of points, so h cannot be made
arbitrarily small, since the density estimate pKDE(x) would then degenerate to a set
of impulses located at the training data points
( )
[ ]
∫
∑





 −
=













 −
=
=













 −
=
=
)dx'
p(x'
h
x'
x
K
h
1
h
x
x
K
E
h
1
h
x
x
K
E
Nh
1
x
p
E
D
(n
D
N
1
n
(n
D
KDE
From [Bishop, 1995]
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
10
Numeric exercise
g Given the dataset below, use Parzen windows to estimate the density
p(x) at y=3,10,15. Use a bandwidth of h=4
n X = {x(1, x(2,…x(N} = {4, 5, 5, 6, 12, 14, 15, 15, 16, 17}
g Solution
n Let’s first draw the dataset to get an idea of what numerical results we should
expect
n Let’s now estimate p(y=3):
n Similarly
[ ] 0.025
4
10
1
0
0
0
0
0
0
0
0
0
1
4
10
1
4
17
3
K
...
4
6
3
K
4
5
3
K
4
5
3
K
4
4
3
K
4
10
1
h
x
y
K
Nh
1
3)
(y
p
1
13/4
-
1
-
1/2
-
1/2
-
1/4
-
1
N
1
n
(n
D
KDE
=
×
=
+
+
+
+
+
+
+
+
+
×
=
=















 −
+
+





 −
+





 −
+





 −
+





 −
×
=







 −
=
= ∑
=
4
3
4
2
1
4
3
4
2
1
4
3
4
2
1
4
3
4
2
1
4
3
4
2
1
5 10 15 x
p(x)
y=3 y=10
y=15
[ ] 0
4
10
0
0
0
0
0
0
0
0
0
0
0
4
10
1
10)
(y
p 1
KDE =
×
=
+
+
+
+
+
+
+
+
+
×
=
=
[ ] 0.1
4
10
4
0
1
1
1
1
0
0
0
0
0
4
10
1
15)
(y
p 1
KDE =
×
=
+
+
+
+
+
+
+
+
+
×
=
=
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
11
Smooth kernels (1)
g The Parzen window has several drawbacks
n Yields density estimates that have discontinuities
n Weights equally all the points xi, regardless of their distance to the estimation
point x
g It is easy to to overcome some of these difficulties by generalizing the
Parzen window with a smooth kernel function K(u) which satisfies the
condition
n Usually, but not not always, K(u) will be a radially symmetric and unimodal
probability density function, such as the multivariate Gaussian density function
n where the expression of the density estimate remains the same as with Parzen
windows
( ) 1
dx
x
K
D
R
=
∫
( )
( )






−
= x
x
2
1
exp
π
2
1
x
K T
2
/
D
∑
=







 −
=
N
1
n
(n
D
KDE
h
x
x
K
Nh
1
(x)
p
From [Bishop, 1995]
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
12
Smooth kernels (2)
g Just as the Parzen window estimate can be considered a sum of boxes
centered at the observations, the smooth kernel estimate is a sum of
“bumps” placed at the data points
n The kernel function determines the shape of the bumps
n The parameter h, also called the smoothing parameter or bandwidth,
determines their width
-10 -5 0 5 10 15 20 25 30 35 40
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
x
P
K
DE(x);
h=3
Density
estimate
Data
points
Kernel
functions
-10 -5 0 5 10 15 20 25 30 35 40
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
x
P
K
DE(x);
h=3
Density
estimate
Data
points
Kernel
functions
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
13
Choosing the bandwidth: univariate case (1)
-10 -5 0 5 10 15 20 25 30 35 40
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
x
P
KDE
(x);
h=5.0
-10 -5 0 5 10 15 20 25 30 35 40
0
0.005
0.01
0.015
0.02
0.025
0.03
x
P
KDE
(x);
h=10.0
-10 -5 0 5 10 15 20 25 30 35 40
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
x
P
KDE
(x);
h=2.5
-10 -5 0 5 10 15 20 25 30 35 40
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
x
P
KDE
(x);
h=1.0
g The problem of choosing the bandwidth is crucial in density estimation
n A large bandwidth will over-smooth the density and mask the structure in the data
n A small bandwidth will yield a density estimate that is spiky and very hard to interpret
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
14
Choosing the bandwidth: univariate case (2)
g We would like to find a value of the smoothing parameter that minimizes the error between
the estimated density and the true density
n A natural measure is the mean square error at the estimation point x, defined by
g This expression is an example of the bias-variance tradeoff that we saw earlier in the
course: the bias can be reduced at the expense of the variance, and vice versa
n The bias of an estimate is the systematic error incurred in the estimation
n The variance of an estimate is the random error incurred in the estimation
g The bias-variance dilemma applied to bandwidth selection simply means that
n A large bandwidth will reduce the differences among the estimates of pKDE(x) for different data sets (the
variance) but it will increase the bias of pKDE(x) with respect to the true density p(x)
n A small bandwidth will reduce the bias of pKDE(x), at the expense of a larger variance in the estimates pKDE(x)
( ) ( ) ( )
( )
[ ] ( ) ( )
[ ]
{ } ( )
( )
4
43
4
42
1
4
4
4 3
4
4
4 2
1
variance
KDE
2
bias
KDE
2
KDE
KDE
x x
p
var
x
p
x
p
E
x
p
x
p
E
p
MSE +
−
=
−
=
-3 -2 -1 0 1 2 3
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
x
P
KDE
(x);
h=0.1
VARIANCE
-3 -2 -1 0 1 2 3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
x
P
KDE
(x);
h=2.0
-3 -2 -1 0 1 2 3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
x
P
KDE
(x);
h=2.0
True density
Multiple kernel
density estimates
BIAS
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
15
Bandwidth selection methods, univariate case (3)
g Subjective choice
n The natural way for choosing the smoothing parameter is to plot out several curves and
choose the estimate that is most in accordance with one’s prior (subjective) ideas
n However, this method is not practical in pattern recognition since we typically have high-
dimensional data
g Reference to a standard distribution
n Assume a standard density function and find the value of the bandwidth that minimizes the
integral of the square error (MISE)
n If we assume that the true distribution is a Gaussian density and we use a Gaussian kernel,
it can be shown that the optimal value of the bandwidth becomes
g where σ is the sample variance and N is the number of training examples
( )
( )
{ } ( ) ( )
( )
[ ]
{ }
∫ −
=
= dx
x
p
x
p
E
argmin
x
p
MISE
argmin
h
2
KDE
h
KDE
h
opt
5
/
1
opt N
σ
06
.
1
h −
=
From [Silverman, 1986]
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
16
Bandwidth selection methods, univariate case (4)
n Better results can be obtained if we use a robust measure of the spread instead of the
sample variance and we reduce the coefficient 1.06 to better cope with multimodal densities.
The optimal bandwidth then becomes
g IQR is the interquartile range, a robust estimate of the spread. It is computed as one half the
difference between the 75th
percentile (Q3) and the 25th
percentile (Q1). The formula for semi-
interquartile range is therefore: (Q3-Q1)/2
n A percentile rank is the proportion of examples in a distribution that a specific example is greater than or equal to
g Likelihood cross-validation
n The ML estimate of h is degenerate since it yields hML=0, a density estimate with Dirac delta
functions at each training data point
n A practical alternative is to maximize the “pseudo-likelihood” computed using leave-one-out
cross-validation






=
= −
34
.
1
IQR
,
σ
min
A
where
AN
9
.
0
h 5
/
1
opt
From [Silverman, 1986]
( )
( ) ( ) ∑
∑
≠
=
−
=
−







 −
−
=






=
N
m
n
1,
m
(m
(n
(n
n
N
1
n
(n
n
h
MLCV
h
x
x
K
h
1
N
1
x
p
where
x
p
log
N
1
argmax
h
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
17
Multivariate density estimation
g The derived expression of the estimate PKDE(x) for multiple dimensions was
n Notice that the bandwidth h is the same for all the axes, so this density estimate will be
weight all the axis equally
g However, if the spread of the data is much greater in one of the coordinate
directions than the others, we should use a vector of smoothing parameters or
even a full covariance matrix, which complicates the procedure
g There are two basic alternatives to solve the scaling problem without having to
use a more general kernel density estimate
n Pre-scale each axis (normalize to unit variance, for instance)
n Pre-whiten the data (linearly transform to have unit covariance matrix), estimate the
density, and then transform back [Fukunaga]
g The whitening transform is simply y=Λ-1/2
MT
x,
where Λ and M are the eigenvalue and eigenvector
matrices of the sample covariance of x
g Fukunaga’s method is equivalent to using a
hyper-ellipsoidal kernel
∑
=







 −
=
N
1
n
(n
D
KDE
h
x
x
K
Nh
1
(x)
p
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
18
Product kernels
g A very popular method for performing multivariate density estimation is the
product kernel, defined as
n The product kernel consists of the product of one-dimensional kernels
g Typically the same kernel function is used in each dimension ( Kd(x)=K(x) ), and
only the bandwidths are allowed to differ
n Bandwidth selection can then be performed with any of the methods presented for univariate
density estimation
g It is important to notice that although the expression of K(x,x(n,h1,…hD) uses
kernel independence, this does not imply that any type of feature independence
is being assumed
n A density estimation method that assumed feature independence would have the following
expression
n Notice how the order of the summation and product are reversed compared to the product
kernel
( ) ( )
( ) ∏
∑
=
=







 −
⋅
⋅
⋅
=
=
D
1
d d
(n
d
D
1
D
1
(n
N
1
i
D
1
(n
PKDE
h
(d)
x
x(d)
K
h
h
1
h
,...,
h
,
x
x,
K
where
h
,...,
h
,
x
x,
K
N
1
x
p
( ) ∏ ∑
= =
− 














 −
=
D
1
d
N
1
i d
(n
d
d
IND
FEAT
h
(d)
x
x(d)
K
Nh
1
x
p
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
19
Product kernel, example 1
-2 -1 0 1 2
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
x1
x
2
-2 -1 0 1 2
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
x1
x
2
-2 -1 0 1 2
3
3.5
4
4.5
5
5.5
6
6.5
7
7.5
x1
x
2
g This example shows the product kernel density estimate of a bivariate unimodal Gaussian
distribution
n 100 data points were drawn from the distribution
n The figures show the true density (left) and the estimates using h=1.06σN-1/5
(middle) and h=0.9AN-1/5
(right)
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
20
Product kernel, example 2
-2 0 2 4 6
-4
-2
0
2
4
6
8
x1
x
2
-2 0 2 4 6
-4
-2
0
2
4
6
8
x1
x
2
-2 0 2 4 6
-4
-2
0
2
4
6
8
x1
x
2
g This example shows the product kernel density estimate of a bivariate bimodal Gaussian
distribution
n 100 data points were drawn from the distribution
n The figures show the true density (left) and the estimates using h=1.06σN-1/5
(middle) and h=0.9AN-1/5
(right)
Introduction to Pattern Analysis
Ricardo Gutierrez-Osuna
Texas A&M University
21
Naïve Bayes classifier
g Recall that the Bayes classifier is given by the following family of discriminant functions
g Using Bayes rule, these discriminant functions can be expressed as
n where P(ωi) is our prior knowledge and P(x|ωi) is obtained through density estimation
g Although we have presented density estimation methods that allow us to estimate the
multivariate likelihood P(x|ωi), the curse of dimensionality makes it a very tough problem!
g One highly practical simplification of the Bayes classifier is the so-called Naïve Bayes
classifier
n The Naïve Bayes classifier makes the assumption that the features are class-conditionally independent
g It is important to notice that this assumption is not as rigid as assuming independent features
n Merging this expression into the discriminant function yields the decision rule for the Naïve Bayes classifier
g The main advantage of the Naïve Bayes classifier is that we only need to compute the
univariate densities P(x(d)|ωi), which is a much easier problem than estimating the
multivariate density P(x|ωi)
n Despite its simplicity, the Naïve Bayes has been shown to have comparable performance to artificial neural
networks and decision tree learning in some domains
x)
|
P(ω
(x)
g
where
i
j
)
x
(
g
)
x
(
g
if
ω
choose
i
i
j
i
i
=
≠
∀
>
)
)P(ω
ω
|
P(x
x)
|
P(ω
(x)
g i
i
i
i ∝
=
∏
=
=
D
1
d
i
i )
ω
|
P(x(d)
)
ω
|
P(x
Naïve Bayes
Classifier
Naïve Bayes
Classifier
( )
∏
=
=
D
1
d
i
i
NB
,
i ω
|
)
d
(
x
P
)
P(ω
)
x
(
g
∏
=
=
D
1
d
P(x(d))
P(x)

More Related Content

What's hot

14_autoencoders.pdf
14_autoencoders.pdf14_autoencoders.pdf
14_autoencoders.pdf
KSChidanandKumarJSSS
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Modelsguestfee8698
 
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)
Anmol Dwivedi
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
Learnbay Datascience
 
Introduction to the theory of optimization
Introduction to the theory of optimizationIntroduction to the theory of optimization
Introduction to the theory of optimization
Delta Pi Systems
 
Application of Chebyshev and Markov Inequality in Machine Learning
Application of Chebyshev and Markov Inequality in Machine LearningApplication of Chebyshev and Markov Inequality in Machine Learning
Application of Chebyshev and Markov Inequality in Machine Learning
VARUN KUMAR
 
Algorithm For optimization.pptx
Algorithm For optimization.pptxAlgorithm For optimization.pptx
Algorithm For optimization.pptx
KARISHMA JAIN
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
Yusuke Yamamoto
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector MachineShao-Chuan Wang
 
Kernel density estimation (kde)
Kernel density estimation (kde)Kernel density estimation (kde)
Kernel density estimation (kde)
Padma Metta
 
Principal Component Analysis for Tensor Analysis and EEG classification
Principal Component Analysis for Tensor Analysis and EEG classificationPrincipal Component Analysis for Tensor Analysis and EEG classification
Principal Component Analysis for Tensor Analysis and EEG classification
Tatsuya Yokota
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
Adnan Masood
 
Different kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceDifferent kind of distance and Statistical Distance
Different kind of distance and Statistical Distance
Khulna University
 
Moment Generating Functions
Moment Generating FunctionsMoment Generating Functions
Moment Generating Functions
DataminingTools Inc
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
Afzaal Subhani
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
Topological data analysis
Topological data analysisTopological data analysis
Topological data analysis
Sunghyon Kyeong
 
KNN
KNNKNN
5 cramer-rao lower bound
5 cramer-rao lower bound5 cramer-rao lower bound
5 cramer-rao lower bound
Solo Hermelin
 

What's hot (20)

14_autoencoders.pdf
14_autoencoders.pdf14_autoencoders.pdf
14_autoencoders.pdf
 
Gaussian Mixture Models
Gaussian Mixture ModelsGaussian Mixture Models
Gaussian Mixture Models
 
Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)Linear Discriminant Analysis (LDA)
Linear Discriminant Analysis (LDA)
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
 
CFA Fit Statistics
CFA Fit StatisticsCFA Fit Statistics
CFA Fit Statistics
 
Introduction to the theory of optimization
Introduction to the theory of optimizationIntroduction to the theory of optimization
Introduction to the theory of optimization
 
Application of Chebyshev and Markov Inequality in Machine Learning
Application of Chebyshev and Markov Inequality in Machine LearningApplication of Chebyshev and Markov Inequality in Machine Learning
Application of Chebyshev and Markov Inequality in Machine Learning
 
Algorithm For optimization.pptx
Algorithm For optimization.pptxAlgorithm For optimization.pptx
Algorithm For optimization.pptx
 
Matrix Factorization
Matrix FactorizationMatrix Factorization
Matrix Factorization
 
Image Classification And Support Vector Machine
Image Classification And Support Vector MachineImage Classification And Support Vector Machine
Image Classification And Support Vector Machine
 
Kernel density estimation (kde)
Kernel density estimation (kde)Kernel density estimation (kde)
Kernel density estimation (kde)
 
Principal Component Analysis for Tensor Analysis and EEG classification
Principal Component Analysis for Tensor Analysis and EEG classificationPrincipal Component Analysis for Tensor Analysis and EEG classification
Principal Component Analysis for Tensor Analysis and EEG classification
 
Belief Networks & Bayesian Classification
Belief Networks & Bayesian ClassificationBelief Networks & Bayesian Classification
Belief Networks & Bayesian Classification
 
Different kind of distance and Statistical Distance
Different kind of distance and Statistical DistanceDifferent kind of distance and Statistical Distance
Different kind of distance and Statistical Distance
 
Moment Generating Functions
Moment Generating FunctionsMoment Generating Functions
Moment Generating Functions
 
K mean-clustering
K mean-clusteringK mean-clustering
K mean-clustering
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Topological data analysis
Topological data analysisTopological data analysis
Topological data analysis
 
KNN
KNNKNN
KNN
 
5 cramer-rao lower bound
5 cramer-rao lower bound5 cramer-rao lower bound
5 cramer-rao lower bound
 

Similar to Kernel estimation(ref)

2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approachnozomuhamada
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
Zahra Amini
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
Christian Robert
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagation
Dong Guo
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlonozomuhamada
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
The Statistical and Applied Mathematical Sciences Institute
 
CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)
Mostafa G. M. Mostafa
 
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureRobust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
IJRES Journal
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
Christian Robert
 
Clique and sting
Clique and stingClique and sting
Clique and sting
Subramanyam Natarajan
 
Probability distribution
Probability distributionProbability distribution
Probability distributionRanjan Kumar
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
Alexander Litvinenko
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
Federico Cerutti
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
홍배 김
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
MuhannadSaleh
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and Applications
Saurav Jha
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca ldanozomuhamada
 

Similar to Kernel estimation(ref) (20)

2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagation
 
2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo2012 mdsp pr04 monte carlo
2012 mdsp pr04 monte carlo
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)CSC446: Pattern Recognition (LN7)
CSC446: Pattern Recognition (LN7)
 
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureRobust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
 
kcde
kcdekcde
kcde
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
Probability distribution
Probability distributionProbability distribution
Probability distribution
 
A nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formulaA nonlinear approximation of the Bayesian Update formula
A nonlinear approximation of the Bayesian Update formula
 
Introduction to Evidential Neural Networks
Introduction to Evidential Neural NetworksIntroduction to Evidential Neural Networks
Introduction to Evidential Neural Networks
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
Dirichlet processes and Applications
Dirichlet processes and ApplicationsDirichlet processes and Applications
Dirichlet processes and Applications
 
eatonmuirheadsoaita
eatonmuirheadsoaitaeatonmuirheadsoaita
eatonmuirheadsoaita
 
Slides univ-van-amsterdam
Slides univ-van-amsterdamSlides univ-van-amsterdam
Slides univ-van-amsterdam
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
 

More from Zahra Amini

Dip azimifar enhancement_l05_2020
Dip azimifar enhancement_l05_2020Dip azimifar enhancement_l05_2020
Dip azimifar enhancement_l05_2020
Zahra Amini
 
Dip azimifar enhancement_l04_2020
Dip azimifar enhancement_l04_2020Dip azimifar enhancement_l04_2020
Dip azimifar enhancement_l04_2020
Zahra Amini
 
Dip azimifar enhancement_l03_2020
Dip azimifar enhancement_l03_2020Dip azimifar enhancement_l03_2020
Dip azimifar enhancement_l03_2020
Zahra Amini
 
Dip azimifar enhancement_l02_2020
Dip azimifar enhancement_l02_2020Dip azimifar enhancement_l02_2020
Dip azimifar enhancement_l02_2020
Zahra Amini
 
Dip azimifar enhancement_l01_2020
Dip azimifar enhancement_l01_2020Dip azimifar enhancement_l01_2020
Dip azimifar enhancement_l01_2020
Zahra Amini
 
Ch 1-3 nn learning 1-7
Ch 1-3 nn learning 1-7Ch 1-3 nn learning 1-7
Ch 1-3 nn learning 1-7
Zahra Amini
 
Ch 1-2 NN classifier
Ch 1-2 NN classifierCh 1-2 NN classifier
Ch 1-2 NN classifier
Zahra Amini
 
Ch 1-1 introduction
Ch 1-1 introductionCh 1-1 introduction
Ch 1-1 introduction
Zahra Amini
 
Dr azimifar pattern recognition lect4
Dr azimifar pattern recognition lect4Dr azimifar pattern recognition lect4
Dr azimifar pattern recognition lect4
Zahra Amini
 
Dr azimifar pattern recognition lect2
Dr azimifar pattern recognition lect2Dr azimifar pattern recognition lect2
Dr azimifar pattern recognition lect2
Zahra Amini
 
Dr azimifar pattern recognition lect1
Dr azimifar pattern recognition lect1Dr azimifar pattern recognition lect1
Dr azimifar pattern recognition lect1
Zahra Amini
 

More from Zahra Amini (11)

Dip azimifar enhancement_l05_2020
Dip azimifar enhancement_l05_2020Dip azimifar enhancement_l05_2020
Dip azimifar enhancement_l05_2020
 
Dip azimifar enhancement_l04_2020
Dip azimifar enhancement_l04_2020Dip azimifar enhancement_l04_2020
Dip azimifar enhancement_l04_2020
 
Dip azimifar enhancement_l03_2020
Dip azimifar enhancement_l03_2020Dip azimifar enhancement_l03_2020
Dip azimifar enhancement_l03_2020
 
Dip azimifar enhancement_l02_2020
Dip azimifar enhancement_l02_2020Dip azimifar enhancement_l02_2020
Dip azimifar enhancement_l02_2020
 
Dip azimifar enhancement_l01_2020
Dip azimifar enhancement_l01_2020Dip azimifar enhancement_l01_2020
Dip azimifar enhancement_l01_2020
 
Ch 1-3 nn learning 1-7
Ch 1-3 nn learning 1-7Ch 1-3 nn learning 1-7
Ch 1-3 nn learning 1-7
 
Ch 1-2 NN classifier
Ch 1-2 NN classifierCh 1-2 NN classifier
Ch 1-2 NN classifier
 
Ch 1-1 introduction
Ch 1-1 introductionCh 1-1 introduction
Ch 1-1 introduction
 
Dr azimifar pattern recognition lect4
Dr azimifar pattern recognition lect4Dr azimifar pattern recognition lect4
Dr azimifar pattern recognition lect4
 
Dr azimifar pattern recognition lect2
Dr azimifar pattern recognition lect2Dr azimifar pattern recognition lect2
Dr azimifar pattern recognition lect2
 
Dr azimifar pattern recognition lect1
Dr azimifar pattern recognition lect1Dr azimifar pattern recognition lect1
Dr azimifar pattern recognition lect1
 

Recently uploaded

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
gestioneergodomus
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
AmarGB2
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
top1002
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
veerababupersonal22
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 

Recently uploaded (20)

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
DfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributionsDfMAy 2024 - key insights and contributions
DfMAy 2024 - key insights and contributions
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Investor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptxInvestor-Presentation-Q1FY2024 investor presentation document.pptx
Investor-Presentation-Q1FY2024 investor presentation document.pptx
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 

Kernel estimation(ref)

  • 1. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 1 LECTURE 7: Kernel Density Estimation g Non-parametric Density Estimation g Histograms g Parzen Windows g Smooth Kernels g Product Kernel Density Estimation g The Naïve Bayes Classifier
  • 2. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 2 Non-parametric density estimation g In the previous two lectures we have assumed that either n The likelihoods p(x|ωi) were known (Likelihood Ratio Test) or n At least the parametric form of the likelihoods were known (Parameter Estimation) g The methods that will be presented in the next two lectures do not afford such luxuries n Instead, they attempt to estimate the density directly from the data without making any parametric assumptions about the underlying distribution n Sounds challenging? You bet! x1 x 2 P(x1, x2| ωi) NON-PARAMETRIC DENSITY ESTIMATION 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 1 1 1 1 1 1 1 1 1 1 1 1 1 11 2 2 2 2 2 2 2 2 2 2 2 2 22 2 2 3 33 3 3 33 3 3 3 3 3 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 5 5 5 5 5 5 5 55 5 5 55 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 77 77 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 9 9 9 99 9 9 9 9 9 99 9 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 1* 1* 1* 1* 2* 2* 2* 2* 2* 2* 3* 3* 3* 3* 3* 3* 3* 4* 5* 5* 5* 5* 5* 6* 6* 6* 6* 6* 6* 6* 7* 7* 7* 7* 8* 8* 8* 9* 9* 9* 9* 9* 10* 10* 10* 10* 10* 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 -0.15 -0.1 -0.05 0 0.05 0.1 0.15 1 1 1 1 1 1 1 1 1 1 1 1 1 11 2 2 2 2 2 2 2 2 2 2 2 2 22 2 2 3 33 3 3 33 3 3 3 3 3 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 5 5 5 5 5 5 5 55 5 5 55 5 5 5 6 6 6 6 6 6 6 6 6 6 6 6 6 77 77 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 9 9 9 99 9 9 9 9 9 99 9 9 10 10 10 10 10 10 10 10 10 10 10 10 10 10 1* 1* 1* 1* 2* 2* 2* 2* 2* 2* 3* 3* 3* 3* 3* 3* 3* 4* 5* 5* 5* 5* 5* 6* 6* 6* 6* 6* 6* 6* 7* 7* 7* 7* 8* 8* 8* 9* 9* 9* 9* 9* 10* 10* 10* 10* 10* x1 x2
  • 3. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 3 The histogram g The simplest form of non-parametric D.E. is the familiar histogram n Divide the sample space into a number of bins and approximate the density at the center of each bin by the fraction of points in the training data that fall into the corresponding bin g The histogram requires two “parameters” to be defined: bin width and starting position of the first bin g The histogram is a very simple form of D.E., but it has several drawbacks n The final shape of the density estimate depends on the starting position of the bins g For multivariate data, the final shape of the density is also affected by the orientation of the bins n The discontinuities of the estimate are not due to the underlying density, they are only an artifact of the chosen bin locations g These discontinuities make it very difficult, without experience, to grasp the structure of the data n A much more serious problem is the curse of dimensionality, since the number of bins grows exponentially with the number of dimensions g In high dimensions we would require a very large number of examples or else most of the bins would be empty g All these drawbacks make the histogram unsuitable for most practical applications except for rapid visualization of results in one or two dimensions n Therefore, we will not spend more time looking at the histogram [ ] [ ] x containing bin of width x as bin same in x of number N 1 (x) P (k H =
  • 4. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 4 Non-parametric density estimation, general formulation (1) g Let us return to the basic definition of probability to get a solid idea of what we are trying to accomplish n The probability that a vector x, drawn from a distribution p(x), will fall in a given region ℜ of the sample space is n Suppose now that N vectors {x(1, x(2, …, x(N} are drawn from the distribution. The probability that k of these N vectors fall in ℜ is given by the binomial distribution n It can be shown (from the properties of the binomial p.m.f.) that the mean and variance of the ratio k/N are n Therefore, as N→∞, the distribution becomes sharper (the variance gets smaller) so we can expect that a good estimate of the probability P can be obtained from the mean fraction of the points that fall within ℜ ∫ ℜ = )dx' p(x' P ( ) k N k P) (1 P k N k P − −         = ( ) N P 1 P P N k E N k Var and P N k E 2 − =               − =       =       N k P ≅ From [Bishop, 1995]
  • 5. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 5 Non-parametric density estimation, general formulation (2) n On the other hand, if we assume that ℜ is so small that p(x) does not vary appreciably within it, then g where V is the volume enclosed by region ℜ n Merging with the previous result we obtain n This estimate becomes more accurate as we increase the number of sample points N and shrink the volume V g In practice the value of N (the total number of examples) is fixed n In order to improve the accuracy of the estimate p(x) we could let V approach zero but then the region ℜ would then become so small that it would enclose no examples n This means that, in practice, we will have to find a compromise value for the volume V g Large enough to include enough examples within ℜ g Small enough to support the assumption that p(x) is constant within ℜ NV k p(x) N k P p(x)V )dx' p(x' P ≅ ⇒      ≅ ≅ = ∫ ℜ p(x)V )dx' p(x' ≅ ∫ ℜ From [Bishop, 1995]
  • 6. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 6 Non-parametric density estimation, general formulation (3) g In conclusion, the general expression for non-parametric density estimation becomes g When applying this result to practical density estimation problems, two basic approaches can be adopted n We can choose a fixed value of the volume V and determine k from the data. This leads to methods commonly referred to as Kernel Density Estimation (KDE), which are the subject of this lecture n We can choose a fixed value of k and determine the corresponding volume V from the data. This gives rise to the k Nearest Neighbor (kNN) approach, which will be covered in the next lecture g It can be shown that both kNN and KDE converge to the true probability density as N→∞, provided that V shrinks with N, and k grows with N appropriately      ≅ V inside examples of number the is k examples of number total the is N x g surroundin volume the is V where NV k p(x) From [Bishop, 1995]
  • 7. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 7 Parzen windows (1) g Suppose that the region ℜ that encloses the k examples is a hypercube with sides of length h centered at the estimation point x n Then its volume is given by V=hD, where D is the number of dimensions g To find the number of examples that fall within this region we define a kernel function K(u) n This kernel, which corresponds to a unit hypercube centered at the origin, is known as a Parzen window or the naïve estimator n The quantity K((x-x(n)/h) is then equal to unity if the point x(n is inside a hypercube of side h centered on x, and zero otherwise ( )    = ∀ < = otherwise 0 D 1,.., j 1/2 u 1 u K j x h h h From [Bishop, 1995]
  • 8. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 8 Parzen windows (2) g The total number of points inside the hypercube is then g Substituting back into the expression for the density estimate n Notice that the Parzen window density estimate resembles the histogram, with the exception that the bin locations are determined by the data points ∑ =         − = N 1 n (n D KDE h x x K Nh 1 (x) p ∑ =         − = N 1 n n ( h x x K k Volume 1 / V K(x-x(1)=1 K(x-x(2)=1 K(x-x(3)=1 K(x-x(4)=0 x(1 x(2 x(3 x(4 x x(1 x(2 x(3 x(4 Volume 1 / V K(x-x(1)=1 K(x-x(1)=1 K(x-x(2)=1 K(x-x(2)=1 K(x-x(3)=1 K(x-x(3)=1 K(x-x(4)=0 K(x-x(4)=0 x(1 x(2 x(3 x(4 x x(1 x(2 x(3 x(4 From [Bishop, 1995]
  • 9. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 9 Parzen windows (3) g To understand the role of the kernel function we compute the expectation of the probability estimate p(x) n where we have assumed that the vectors x(n are drawn independently from the true density p(x) g We can see that the expectation of the estimated density pKDE(x) is a convolution of the true density p(x) with the kernel function n The width h of the kernel plays the role of a smoothing parameter: the wider the kernel function, the smoother the estimate pKDE(x) g For h→0, the kernel approaches a Dirac delta function and pKDE(x) approaches the true density n However, in practice we have a finite number of points, so h cannot be made arbitrarily small, since the density estimate pKDE(x) would then degenerate to a set of impulses located at the training data points ( ) [ ] ∫ ∑       − =               − = =               − = = )dx' p(x' h x' x K h 1 h x x K E h 1 h x x K E Nh 1 x p E D (n D N 1 n (n D KDE From [Bishop, 1995]
  • 10. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 10 Numeric exercise g Given the dataset below, use Parzen windows to estimate the density p(x) at y=3,10,15. Use a bandwidth of h=4 n X = {x(1, x(2,…x(N} = {4, 5, 5, 6, 12, 14, 15, 15, 16, 17} g Solution n Let’s first draw the dataset to get an idea of what numerical results we should expect n Let’s now estimate p(y=3): n Similarly [ ] 0.025 4 10 1 0 0 0 0 0 0 0 0 0 1 4 10 1 4 17 3 K ... 4 6 3 K 4 5 3 K 4 5 3 K 4 4 3 K 4 10 1 h x y K Nh 1 3) (y p 1 13/4 - 1 - 1/2 - 1/2 - 1/4 - 1 N 1 n (n D KDE = × = + + + + + + + + + × = =                 − + +       − +       − +       − +       − × =         − = = ∑ = 4 3 4 2 1 4 3 4 2 1 4 3 4 2 1 4 3 4 2 1 4 3 4 2 1 5 10 15 x p(x) y=3 y=10 y=15 [ ] 0 4 10 0 0 0 0 0 0 0 0 0 0 0 4 10 1 10) (y p 1 KDE = × = + + + + + + + + + × = = [ ] 0.1 4 10 4 0 1 1 1 1 0 0 0 0 0 4 10 1 15) (y p 1 KDE = × = + + + + + + + + + × = =
  • 11. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 11 Smooth kernels (1) g The Parzen window has several drawbacks n Yields density estimates that have discontinuities n Weights equally all the points xi, regardless of their distance to the estimation point x g It is easy to to overcome some of these difficulties by generalizing the Parzen window with a smooth kernel function K(u) which satisfies the condition n Usually, but not not always, K(u) will be a radially symmetric and unimodal probability density function, such as the multivariate Gaussian density function n where the expression of the density estimate remains the same as with Parzen windows ( ) 1 dx x K D R = ∫ ( ) ( )       − = x x 2 1 exp π 2 1 x K T 2 / D ∑ =         − = N 1 n (n D KDE h x x K Nh 1 (x) p From [Bishop, 1995]
  • 12. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 12 Smooth kernels (2) g Just as the Parzen window estimate can be considered a sum of boxes centered at the observations, the smooth kernel estimate is a sum of “bumps” placed at the data points n The kernel function determines the shape of the bumps n The parameter h, also called the smoothing parameter or bandwidth, determines their width -10 -5 0 5 10 15 20 25 30 35 40 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 x P K DE(x); h=3 Density estimate Data points Kernel functions -10 -5 0 5 10 15 20 25 30 35 40 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 x P K DE(x); h=3 Density estimate Data points Kernel functions
  • 13. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 13 Choosing the bandwidth: univariate case (1) -10 -5 0 5 10 15 20 25 30 35 40 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 x P KDE (x); h=5.0 -10 -5 0 5 10 15 20 25 30 35 40 0 0.005 0.01 0.015 0.02 0.025 0.03 x P KDE (x); h=10.0 -10 -5 0 5 10 15 20 25 30 35 40 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 x P KDE (x); h=2.5 -10 -5 0 5 10 15 20 25 30 35 40 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 x P KDE (x); h=1.0 g The problem of choosing the bandwidth is crucial in density estimation n A large bandwidth will over-smooth the density and mask the structure in the data n A small bandwidth will yield a density estimate that is spiky and very hard to interpret
  • 14. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 14 Choosing the bandwidth: univariate case (2) g We would like to find a value of the smoothing parameter that minimizes the error between the estimated density and the true density n A natural measure is the mean square error at the estimation point x, defined by g This expression is an example of the bias-variance tradeoff that we saw earlier in the course: the bias can be reduced at the expense of the variance, and vice versa n The bias of an estimate is the systematic error incurred in the estimation n The variance of an estimate is the random error incurred in the estimation g The bias-variance dilemma applied to bandwidth selection simply means that n A large bandwidth will reduce the differences among the estimates of pKDE(x) for different data sets (the variance) but it will increase the bias of pKDE(x) with respect to the true density p(x) n A small bandwidth will reduce the bias of pKDE(x), at the expense of a larger variance in the estimates pKDE(x) ( ) ( ) ( ) ( ) [ ] ( ) ( ) [ ] { } ( ) ( ) 4 43 4 42 1 4 4 4 3 4 4 4 2 1 variance KDE 2 bias KDE 2 KDE KDE x x p var x p x p E x p x p E p MSE + − = − = -3 -2 -1 0 1 2 3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 x P KDE (x); h=0.1 VARIANCE -3 -2 -1 0 1 2 3 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 x P KDE (x); h=2.0 -3 -2 -1 0 1 2 3 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 x P KDE (x); h=2.0 True density Multiple kernel density estimates BIAS
  • 15. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 15 Bandwidth selection methods, univariate case (3) g Subjective choice n The natural way for choosing the smoothing parameter is to plot out several curves and choose the estimate that is most in accordance with one’s prior (subjective) ideas n However, this method is not practical in pattern recognition since we typically have high- dimensional data g Reference to a standard distribution n Assume a standard density function and find the value of the bandwidth that minimizes the integral of the square error (MISE) n If we assume that the true distribution is a Gaussian density and we use a Gaussian kernel, it can be shown that the optimal value of the bandwidth becomes g where σ is the sample variance and N is the number of training examples ( ) ( ) { } ( ) ( ) ( ) [ ] { } ∫ − = = dx x p x p E argmin x p MISE argmin h 2 KDE h KDE h opt 5 / 1 opt N σ 06 . 1 h − = From [Silverman, 1986]
  • 16. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 16 Bandwidth selection methods, univariate case (4) n Better results can be obtained if we use a robust measure of the spread instead of the sample variance and we reduce the coefficient 1.06 to better cope with multimodal densities. The optimal bandwidth then becomes g IQR is the interquartile range, a robust estimate of the spread. It is computed as one half the difference between the 75th percentile (Q3) and the 25th percentile (Q1). The formula for semi- interquartile range is therefore: (Q3-Q1)/2 n A percentile rank is the proportion of examples in a distribution that a specific example is greater than or equal to g Likelihood cross-validation n The ML estimate of h is degenerate since it yields hML=0, a density estimate with Dirac delta functions at each training data point n A practical alternative is to maximize the “pseudo-likelihood” computed using leave-one-out cross-validation       = = − 34 . 1 IQR , σ min A where AN 9 . 0 h 5 / 1 opt From [Silverman, 1986] ( ) ( ) ( ) ∑ ∑ ≠ = − = −         − − =       = N m n 1, m (m (n (n n N 1 n (n n h MLCV h x x K h 1 N 1 x p where x p log N 1 argmax h
  • 17. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 17 Multivariate density estimation g The derived expression of the estimate PKDE(x) for multiple dimensions was n Notice that the bandwidth h is the same for all the axes, so this density estimate will be weight all the axis equally g However, if the spread of the data is much greater in one of the coordinate directions than the others, we should use a vector of smoothing parameters or even a full covariance matrix, which complicates the procedure g There are two basic alternatives to solve the scaling problem without having to use a more general kernel density estimate n Pre-scale each axis (normalize to unit variance, for instance) n Pre-whiten the data (linearly transform to have unit covariance matrix), estimate the density, and then transform back [Fukunaga] g The whitening transform is simply y=Λ-1/2 MT x, where Λ and M are the eigenvalue and eigenvector matrices of the sample covariance of x g Fukunaga’s method is equivalent to using a hyper-ellipsoidal kernel ∑ =         − = N 1 n (n D KDE h x x K Nh 1 (x) p
  • 18. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 18 Product kernels g A very popular method for performing multivariate density estimation is the product kernel, defined as n The product kernel consists of the product of one-dimensional kernels g Typically the same kernel function is used in each dimension ( Kd(x)=K(x) ), and only the bandwidths are allowed to differ n Bandwidth selection can then be performed with any of the methods presented for univariate density estimation g It is important to notice that although the expression of K(x,x(n,h1,…hD) uses kernel independence, this does not imply that any type of feature independence is being assumed n A density estimation method that assumed feature independence would have the following expression n Notice how the order of the summation and product are reversed compared to the product kernel ( ) ( ) ( ) ∏ ∑ = =         − ⋅ ⋅ ⋅ = = D 1 d d (n d D 1 D 1 (n N 1 i D 1 (n PKDE h (d) x x(d) K h h 1 h ,..., h , x x, K where h ,..., h , x x, K N 1 x p ( ) ∏ ∑ = = −                 − = D 1 d N 1 i d (n d d IND FEAT h (d) x x(d) K Nh 1 x p
  • 19. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 19 Product kernel, example 1 -2 -1 0 1 2 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 x1 x 2 -2 -1 0 1 2 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 x1 x 2 -2 -1 0 1 2 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 x1 x 2 g This example shows the product kernel density estimate of a bivariate unimodal Gaussian distribution n 100 data points were drawn from the distribution n The figures show the true density (left) and the estimates using h=1.06σN-1/5 (middle) and h=0.9AN-1/5 (right)
  • 20. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 20 Product kernel, example 2 -2 0 2 4 6 -4 -2 0 2 4 6 8 x1 x 2 -2 0 2 4 6 -4 -2 0 2 4 6 8 x1 x 2 -2 0 2 4 6 -4 -2 0 2 4 6 8 x1 x 2 g This example shows the product kernel density estimate of a bivariate bimodal Gaussian distribution n 100 data points were drawn from the distribution n The figures show the true density (left) and the estimates using h=1.06σN-1/5 (middle) and h=0.9AN-1/5 (right)
  • 21. Introduction to Pattern Analysis Ricardo Gutierrez-Osuna Texas A&M University 21 Naïve Bayes classifier g Recall that the Bayes classifier is given by the following family of discriminant functions g Using Bayes rule, these discriminant functions can be expressed as n where P(ωi) is our prior knowledge and P(x|ωi) is obtained through density estimation g Although we have presented density estimation methods that allow us to estimate the multivariate likelihood P(x|ωi), the curse of dimensionality makes it a very tough problem! g One highly practical simplification of the Bayes classifier is the so-called Naïve Bayes classifier n The Naïve Bayes classifier makes the assumption that the features are class-conditionally independent g It is important to notice that this assumption is not as rigid as assuming independent features n Merging this expression into the discriminant function yields the decision rule for the Naïve Bayes classifier g The main advantage of the Naïve Bayes classifier is that we only need to compute the univariate densities P(x(d)|ωi), which is a much easier problem than estimating the multivariate density P(x|ωi) n Despite its simplicity, the Naïve Bayes has been shown to have comparable performance to artificial neural networks and decision tree learning in some domains x) | P(ω (x) g where i j ) x ( g ) x ( g if ω choose i i j i i = ≠ ∀ > ) )P(ω ω | P(x x) | P(ω (x) g i i i i ∝ = ∏ = = D 1 d i i ) ω | P(x(d) ) ω | P(x Naïve Bayes Classifier Naïve Bayes Classifier ( ) ∏ = = D 1 d i i NB , i ω | ) d ( x P ) P(ω ) x ( g ∏ = = D 1 d P(x(d)) P(x)