04/20/25 RBF NetworksM.W. Mak
Radial Basis Function
Radial Basis Function
Networks
Networks
1. Introduction
2. Finding RBF Parameters
3. Decision Surface of RBF Networks
4. Comparison between RBF and BP
2.
04/20/25 RBF NetworksM.W. Mak
1. Introduction
MLPs are highly non-linear in the parameter space
gradient descent local minima
RBF networks solve this problem by dividing the
learning into two independent processes.
w
3.
04/20/25 RBF NetworksM.W. Mak
RBF networks implement the function
s x w w x c
i i i
i
M
( ) ( )
0
1
wi i and ci can be determined separately
Fast learning algorithm
Basis function types
( ) log( )
( ) exp( )
( )
( )
r r r
r r
r r
r
r
2
2
2
2 2
2 2
1
4.
04/20/25 RBF NetworksM.W. Mak
For Gaussian basis functions
s x w w x c
w w
x c
p i i p i
i
M
i
pj ij
ij
j
n
i
M
( )
exp
( )
0
1
0
2
2
1
1 2
Assume the variance across each dimension are
equal
s x w w x c
p i
i
pj ij
j
n
i
M
( ) exp ( )
0 2
2
1
1
1
2
5.
04/20/25 RBF NetworksM.W. Mak
To write in matrix form, let
a x c
s x w a a
pi i p i
p i pi
i
M
p
where
( )
0
0 1
s x
s x
s x
a a a
a a a
a a a
w
w
w
N
M
M
N N NM M
( )
( )
( )
`
1
2
11 12 1
21 22 2
1 2
0
1
1
1
1
s Aw
A s
1
w
6.
04/20/25 RBF NetworksM.W. Mak
2. Finding the RBF Parameters
Use the K-mean algorithm to find ci
1
2
2
2
1
1
7.
04/20/25 RBF NetworksM.W. Mak
K-mean Algorithm
step1: K initial clusters are chosen randomly from the samples
to form K groups.
step2: Each new sample is added to the group whose mean is
the closest to this sample.
step3: Adjust the mean of the group to take account of the new
points.
step4: Repeat step2 until the distance between the old means
and the new means of all clusters is smaller than a
predefined tolerance.
8.
04/20/25 RBF NetworksM.W. Mak
Outcome: There are K clusters with means representing
the centroid of each clusters.
Advantages: (1) A fast and simple algorithm.
(2) Reduce the effects of noisy samples.
9.
04/20/25 RBF NetworksM.W. Mak
Use K nearest neighbor rule to find the function
width
2
1
1
K
k
i
k
i c
c
K
k-th nearest neighbor of ci
The objective is to cover the training points so that a
smooth fit of the training samples can be achieved
04/20/25 1
RBF NetworksM.W. Mak
Determining weights w using the least square
method
E d w x c
p j j
j
M
p j
p
N
0
2
1
where dp is the desired output for pattern p
E
E
T
T T
( ) ( )
( )
d Aw d Aw
w
A A A d
Set w
0 1
12.
04/20/25 1
RBF NetworksM.W. Mak
Let E be the total-squared error between the
actual output and the target output T
N
d
d
d
d
2
1
w
A
d
w
A
d
E
T
Aw
A
w
Aw
d
d
A
w
d
d
Aw
d
A
w
d
T
T
T
T
T
T
T
T
T
Aw
A
w
w
Aw
d
w
d
A
w
w
w
E T
T
T
T
T
0
Aw
A
d
A
w
A
A
Aw
A
d
A
w
w
d
A
T
T
T
T
T
T
T
T
2
2
d
A
A
A
w
d
A
Aw
A
T
T
T
T
1
13.
04/20/25 1
RBF NetworksM.W. Mak
Note that
x
A
x
A
x
A
x
x
y
A
y
A
x
x
y
y
x
x
T
T
T
T
Problems
(1) Susceptible to round-off error.
(2) No solution if is singular.
(3) If is close to singular, we get very large component
in w.
A
AT
A
AT
14.
04/20/25 1
RBF NetworksM.W. Mak
Reasons
(1) Inaccuracy in forming
(2) If A is ill-conditioned, small change in A introduces
large change in
(3) If AT
A is close to singular, dependent columns in AT
A
exist
A
AT
1
A
AT
e.g. two parallel straight lines.
x
y
15.
04/20/25 1
RBF NetworksM.W. Mak
singular matrix :
1
0
4
2
2
1
y
x
If the lines are nearly parallel, they intersect each other at
,
i.e.
0
0
y
x
0
0
y
x
or
So, the magnitude of the solution becomes very large; hence
overflow will occur.
The effect of the large components can be cancelled out if the
machine precision is infinite.
04/20/25 1
RBF NetworksM.W. Mak
xp
K-means
K-Nearest
Neighbor
Basis
Functions
Linear
Regression
ci
ci
i
A w
RBF learning process
RBF learning process
18.
04/20/25 1
RBF NetworksM.W. Mak
RBF learning by gradient descent
Let and
i p
pj ij
ij
j
n
p p p
x
x c
e x d x s x
( ) exp ( ) ( ) ( )
1
2
2
2
1
E e xp
p
N
1
2 1
2
( ) .
we have
E
w
E E
c
i ij ij
, , and
Apply
19.
04/20/25 1
RBF NetworksM.W. Mak
we have the following update equations
w t w t e x x i M
w t w t e x i
t t e x w x x c t
c t c t e x w x x c t
i i w p i p
p
N
i i w p
p
N
ij ij p i i p pj ij ij
p
N
ij ij c p i i p pj ij ij
p
N
( ) ( ) ( ) ( ) , , ,
( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
1 1 2
1 0
1
1
1
1
2 3
1
2
1
when
when
20.
04/20/25 2
RBF NetworksM.W. Mak
Elliptical Basis Function networks
)}
(
)
(
2
1
exp{
)
( 1
j
p
j
T
j
p
p
j x
x
x
j
j
: function centers
: covariance matrix
1
x1
2 M
x2 xn
J
j
p
j
kj
p
k x
w
x
y
0
)
(
)
(
y W D
W = +
y x
1( )
y x
K ( )
21.
04/20/25 2
RBF NetworksM.W. Mak
K-means and Sample covariance
K-means :
if
Sample covariance :
j j
j x
N
x
j
1
x j
x x j k
j k
j
j
j j
T
x
N
x x
j
1
( )( )
The EM algorithm
04/20/25 2
RBF NetworksM.W. Mak
Out put 1 of an EBFnet work (bias, no rescale,gamma=1)
'nxor.ebf4.Y.N.1.dat'
1.43
0.948
0.463
-0.0209
-0.505
-3
-2
-1
0
1
2
3 -3
-2
-1
0
1
2
3
-1
-0.5
0
0.5
1
1.5
2
EBF Network’s output
Elliptical Basis Function Networks
Elliptical Basis Function Networks
24.
04/20/25 2
RBF NetworksM.W. Mak
RBFN for Pattern Classification
MLP RBF
Hyperplane Kernel function
The probability density function (also called conditional density
function or likelihood) of the k-th class is defined as
k
C
x
p |
25.
04/20/25 2
RBF NetworksM.W. Mak
•According to Bays’ theorem, the posterior prob. is
x
p
C
P
C
x
p
x
C
P k
k
k
|
|
where P(Ck) is the prior prob. and
)
(
)
|
( r
r
r C
P
C
x
p
x
p
• It is possible to use a common pool of M basis functions,
labeled by an index j, to represent all of the class-conditional
densities, i.e.
)
|
(
)
|
(
|
1
k
M
j
k C
j
P
j
x
p
C
x
p
26.
04/20/25 2
RBF NetworksM.W. Mak
)
1
|
(x
p
)
|
( k
C
x
p
)
|
( M
x
p
)
2
|
(x
p
k
M
j
k C
j
P
j
x
p
C
x
p |
|
|
1
)
|
( k
C
M
P
27.
04/20/25 2
RBF NetworksM.W. Mak
k
k
M
j
k C
P
C
j
P
j
x
p
x
p
1
|
|
M
j
k
k
k
M
j
j
P
j
x
p
C
P
C
j
P
j
x
p
1
1
|
|
|
j
P
j
P
j
P
j
x
p
C
P
C
j
P
j
x
p
x
C
P M
j
M
j
k
k
k
1
'
'
1
|
|
|
|
28.
04/20/25 2
RBF NetworksM.W. Mak
M
j
j
kj
M
j
k
M
j
M
j
k
k
x
w
x
j
P
j
C
P
j
P
j
x
p
j
P
j
x
p
j
P
C
P
C
j
P
1
1
1
'
'
1
|
|
|
|
|
Hidden node’s output posterior prob. of the j-th set of
features in the input .
weight posterior prob. of class membership, given
the presence of the j-th set of features .
:
)
|
(
)
( x
j
P
x
j
:
)
|
( j
C
P
w k
kj
No bias term
29.
04/20/25 2
RBF NetworksM.W. Mak
RBF networks MLP
Learning speed Very Fast Very Slow
Convergence Almost guarantee Not guarantee
Response time Slow Fast
Memory
requirement
Very large Small
Hardware
implementation
IBM ZISC036
Nestor Ni1000
www-5.ibm.com/fr/cdlab/zisc.html
Voice Direct 364
www.sensoryinc.com
Generalization Usually better Usually poorer
Comparison of RBF and MLP
Comparison of RBF and MLP
To learn more about NN hardware, see
To learn more about NN hardware, see
http://www.particle.kth.se/~lindsey/HardwareNNWCourse/home.html
http://www.particle.kth.se/~lindsey/HardwareNNWCourse/home.html