04/20/25 RBF Networks M.W. Mak
Radial Basis Function
Radial Basis Function
Networks
Networks
1. Introduction
2. Finding RBF Parameters
3. Decision Surface of RBF Networks
4. Comparison between RBF and BP
04/20/25 RBF Networks M.W. Mak
1. Introduction
 MLPs are highly non-linear in the parameter space
gradient descent  local minima
 RBF networks solve this problem by dividing the
learning into two independent processes.
w

04/20/25 RBF Networks M.W. Mak
 RBF networks implement the function
s x w w x c
i i i
i
M
( ) ( )
  
  


0
1

 wi i and ci can be determined separately
 Fast learning algorithm
 Basis function types




( ) log( )
( ) exp( )
( )
( )
r r r
r r
r r
r
r

 
 


2
2
2
2 2
2 2
1



04/20/25 RBF Networks M.W. Mak
 For Gaussian basis functions
 
s x w w x c
w w
x c
p i i p i
i
M
i
pj ij
ij
j
n
i
M
( )
exp
( )
  
  
  













0
1
0
2
2
1
1 2


 Assume the variance  across each dimension are
equal
s x w w x c
p i
i
pj ij
j
n
i
M
( ) exp ( )

   










0 2
2
1
1
1
2
04/20/25 RBF Networks M.W. Mak
 To write in matrix form, let
 
a x c
s x w a a
pi i p i
p i pi
i
M
p
 
  



 

where
( )
0
0 1
s x
s x
s x
a a a
a a a
a a a
w
w
w
N
M
M
N N NM M
( )
( )
( )
`






    


1
2
11 12 1
21 22 2
1 2
0
1
1
1
1





































s Aw
A s
1

  
w
04/20/25 RBF Networks M.W. Mak
2. Finding the RBF Parameters
 Use the K-mean algorithm to find ci

1

2

2

2

1

1
04/20/25 RBF Networks M.W. Mak
K-mean Algorithm
step1: K initial clusters are chosen randomly from the samples
to form K groups.
step2: Each new sample is added to the group whose mean is
the closest to this sample.
step3: Adjust the mean of the group to take account of the new
points.
step4: Repeat step2 until the distance between the old means
and the new means of all clusters is smaller than a
predefined tolerance.
04/20/25 RBF Networks M.W. Mak
Outcome: There are K clusters with means representing
the centroid of each clusters.
Advantages: (1) A fast and simple algorithm.
(2) Reduce the effects of noisy samples.
04/20/25 RBF Networks M.W. Mak
 Use K nearest neighbor rule to find the function
width 
2
1
1





K
k
i
k
i c
c
K


k-th nearest neighbor of ci
 The objective is to cover the training points so that a
smooth fit of the training samples can be achieved
04/20/25 1
RBF Networks M.W. Mak
Centers and widths found by K-means and K-NN
04/20/25 1
RBF Networks M.W. Mak
 Determining weights w using the least square
method
 
E d w x c
p j j
j
M
p j
p
N
  









 
0
2
1
 
where dp is the desired output for pattern p
E
E
T
T T
  
   
( ) ( )
( )
d Aw d Aw
w
A A A d
Set w


0 1
04/20/25 1
RBF Networks M.W. Mak
Let E be the total-squared error between the
actual output and the target output  T
N
d
d
d
d 

2
1

   
w
A
d
w
A
d
E
T 







  
Aw
A
w
Aw
d
d
A
w
d
d
Aw
d
A
w
d
T
T
T
T
T
T
T
T
T







     
Aw
A
w
w
Aw
d
w
d
A
w
w
w
E T
T
T
T
T












 0
   
Aw
A
d
A
w
A
A
Aw
A
d
A
w
w
d
A
T
T
T
T
T
T
T
T
2
2 









  d
A
A
A
w
d
A
Aw
A
T
T
T
T
1





04/20/25 1
RBF Networks M.W. Mak
Note that
 
 
  x
A
x
A
x
A
x
x
y
A
y
A
x
x
y
y
x
x
T
T
T
T























Problems
(1) Susceptible to round-off error.
(2) No solution if is singular.
(3) If is close to singular, we get very large component
in w.
A
AT
A
AT
04/20/25 1
RBF Networks M.W. Mak
Reasons
(1) Inaccuracy in forming
(2) If A is ill-conditioned, small change in A introduces
large change in
(3) If AT
A is close to singular, dependent columns in AT
A
exist
A
AT
 1

A
AT
e.g. two parallel straight lines.
x
y
04/20/25 1
RBF Networks M.W. Mak
singular matrix :



















1
0
4
2
2
1
y
x
If the lines are nearly parallel, they intersect each other at
 


 ,
i.e.

















0
0
y
x

















0
0
y
x
or
So, the magnitude of the solution becomes very large; hence
overflow will occur.
The effect of the large components can be cancelled out if the
machine precision is infinite.
04/20/25 1
RBF Networks M.W. Mak
If the machine precision is finite, we get large error.
For example,























 0
0
10
2
10
4
2
1
2
1
38
38
Finite machine precision =>


























 33
33
38
38
10
1
10
1
10
2
10
00001
.
4
2
1
2
1
Solution: Singular Value Decomposition
04/20/25 1
RBF Networks M.W. Mak
xp
K-means
K-Nearest
Neighbor
Basis
Functions
Linear
Regression
ci
ci
i
A w
 RBF learning process
RBF learning process
04/20/25 1
RBF Networks M.W. Mak
 RBF learning by gradient descent
 
Let and
i p
pj ij
ij
j
n
p p p
x
x c
e x d x s x
( ) exp ( ) ( ) ( )
   
 









 


1
2
2
2
1 
 
E e xp
p
N



1
2 1
2
( ) .

we have






E
w
E E
c
i ij ij
, , and
Apply
04/20/25 1
RBF Networks M.W. Mak
we have the following update equations
   
  
w t w t e x x i M
w t w t e x i
t t e x w x x c t
c t c t e x w x x c t
i i w p i p
p
N
i i w p
p
N
ij ij p i i p pj ij ij
p
N
ij ij c p i i p pj ij ij
p
N
( ) ( ) ( ) ( ) , , ,
( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( )
   
   
   
   










1 1 2
1 0
1
1
1
1
2 3
1
2
1


   
 

 


 
 



when
when
04/20/25 2
RBF Networks M.W. Mak
Elliptical Basis Function networks
)}
(
)
(
2
1
exp{
)
( 1
j
p
j
T
j
p
p
j x
x
x 






  





j
 j
: function centers
: covariance matrix
1
x1
2 M
x2 xn




J
j
p
j
kj
p
k x
w
x
y
0
)
(
)
(


y W D
 
 
W = +
y x
1( )

y x
K ( )

 
04/20/25 2
RBF Networks M.W. Mak
 K-means and Sample covariance
K-means :
if
Sample covariance :
  
 

j j
j x
N
x
j
 


 1

x j
 
   
x x j k
j k
    
 
 
 j
j
j j
T
x
N
x x
j
  


1
(  )(  )
   
 

 The EM algorithm
04/20/25 2
RBF Networks M.W. Mak
EBF Vs. RBF networks
EBF Vs. RBF networks
RBFN with 4 centers EBFN with 4 centers
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3
Class1
Class2
-3
-2
-1
0
1
2
3
-3 -2 -1 0 1 2 3
Class1
Class2
04/20/25 2
RBF Networks M.W. Mak
Out put 1 of an EBFnet work (bias, no rescale,gamma=1)
'nxor.ebf4.Y.N.1.dat'
1.43
0.948
0.463
-0.0209
-0.505
-3
-2
-1
0
1
2
3 -3
-2
-1
0
1
2
3
-1
-0.5
0
0.5
1
1.5
2
EBF Network’s output
Elliptical Basis Function Networks
Elliptical Basis Function Networks
04/20/25 2
RBF Networks M.W. Mak
RBFN for Pattern Classification
MLP RBF
Hyperplane Kernel function
The probability density function (also called conditional density
function or likelihood) of the k-th class is defined as
 
k
C
x
p |

04/20/25 2
RBF Networks M.W. Mak
•According to Bays’ theorem, the posterior prob. is
     
 
x
p
C
P
C
x
p
x
C
P k
k
k 

 |
| 
where P(Ck) is the prior prob. and
  )
(
)
|
( r
r
r C
P
C
x
p
x
p 



• It is possible to use a common pool of M basis functions,
labeled by an index j, to represent all of the class-conditional
densities, i.e.
  )
|
(
)
|
(
|
1
k
M
j
k C
j
P
j
x
p
C
x
p 




04/20/25 2
RBF Networks M.W. Mak
)
1
|
(x
p
)
|
( k
C
x
p
)
|
( M
x
p
)
2
|
(x
p
     
k
M
j
k C
j
P
j
x
p
C
x
p |
|
|
1





)
|
( k
C
M
P
04/20/25 2
RBF Networks M.W. Mak
       
k
k
M
j
k C
P
C
j
P
j
x
p
x
p 



1
|
|


     
   







M
j
k
k
k
M
j
j
P
j
x
p
C
P
C
j
P
j
x
p
1
1
|
|
|


 
     
   
 
 
j
P
j
P
j
P
j
x
p
C
P
C
j
P
j
x
p
x
C
P M
j
M
j
k
k
k 






1
'
'
1
|
|
|
|



04/20/25 2
RBF Networks M.W. Mak
   
 
   
   
   
 













M
j
j
kj
M
j
k
M
j
M
j
k
k
x
w
x
j
P
j
C
P
j
P
j
x
p
j
P
j
x
p
j
P
C
P
C
j
P
1
1
1
'
'
1
|
|
|
|
|




Hidden node’s output posterior prob. of the j-th set of
features in the input .
weight posterior prob. of class membership, given
the presence of the j-th set of features .
:
)
|
(
)
( x
j
P
x
j




:
)
|
( j
C
P
w k
kj 
No bias term
04/20/25 2
RBF Networks M.W. Mak
RBF networks MLP
Learning speed Very Fast Very Slow
Convergence Almost guarantee Not guarantee
Response time Slow Fast
Memory
requirement
Very large Small
Hardware
implementation
IBM ZISC036
Nestor Ni1000
www-5.ibm.com/fr/cdlab/zisc.html
Voice Direct 364
www.sensoryinc.com
Generalization Usually better Usually poorer
Comparison of RBF and MLP
Comparison of RBF and MLP
To learn more about NN hardware, see
To learn more about NN hardware, see
http://www.particle.kth.se/~lindsey/HardwareNNWCourse/home.html
http://www.particle.kth.se/~lindsey/HardwareNNWCourse/home.html

EIE520_Radial_Basis_Function_Networks.ppt

  • 1.
    04/20/25 RBF NetworksM.W. Mak Radial Basis Function Radial Basis Function Networks Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison between RBF and BP
  • 2.
    04/20/25 RBF NetworksM.W. Mak 1. Introduction  MLPs are highly non-linear in the parameter space gradient descent  local minima  RBF networks solve this problem by dividing the learning into two independent processes. w 
  • 3.
    04/20/25 RBF NetworksM.W. Mak  RBF networks implement the function s x w w x c i i i i M ( ) ( )         0 1   wi i and ci can be determined separately  Fast learning algorithm  Basis function types     ( ) log( ) ( ) exp( ) ( ) ( ) r r r r r r r r r        2 2 2 2 2 2 2 1   
  • 4.
    04/20/25 RBF NetworksM.W. Mak  For Gaussian basis functions   s x w w x c w w x c p i i p i i M i pj ij ij j n i M ( ) exp ( )                       0 1 0 2 2 1 1 2    Assume the variance  across each dimension are equal s x w w x c p i i pj ij j n i M ( ) exp ( )                0 2 2 1 1 1 2
  • 5.
    04/20/25 RBF NetworksM.W. Mak  To write in matrix form, let   a x c s x w a a pi i p i p i pi i M p            where ( ) 0 0 1 s x s x s x a a a a a a a a a w w w N M M N N NM M ( ) ( ) ( ) `              1 2 11 12 1 21 22 2 1 2 0 1 1 1 1                                      s Aw A s 1     w
  • 6.
    04/20/25 RBF NetworksM.W. Mak 2. Finding the RBF Parameters  Use the K-mean algorithm to find ci  1  2  2  2  1  1
  • 7.
    04/20/25 RBF NetworksM.W. Mak K-mean Algorithm step1: K initial clusters are chosen randomly from the samples to form K groups. step2: Each new sample is added to the group whose mean is the closest to this sample. step3: Adjust the mean of the group to take account of the new points. step4: Repeat step2 until the distance between the old means and the new means of all clusters is smaller than a predefined tolerance.
  • 8.
    04/20/25 RBF NetworksM.W. Mak Outcome: There are K clusters with means representing the centroid of each clusters. Advantages: (1) A fast and simple algorithm. (2) Reduce the effects of noisy samples.
  • 9.
    04/20/25 RBF NetworksM.W. Mak  Use K nearest neighbor rule to find the function width  2 1 1      K k i k i c c K   k-th nearest neighbor of ci  The objective is to cover the training points so that a smooth fit of the training samples can be achieved
  • 10.
    04/20/25 1 RBF NetworksM.W. Mak Centers and widths found by K-means and K-NN
  • 11.
    04/20/25 1 RBF NetworksM.W. Mak  Determining weights w using the least square method   E d w x c p j j j M p j p N               0 2 1   where dp is the desired output for pattern p E E T T T        ( ) ( ) ( ) d Aw d Aw w A A A d Set w   0 1
  • 12.
    04/20/25 1 RBF NetworksM.W. Mak Let E be the total-squared error between the actual output and the target output  T N d d d d   2 1      w A d w A d E T            Aw A w Aw d d A w d d Aw d A w d T T T T T T T T T              Aw A w w Aw d w d A w w w E T T T T T              0     Aw A d A w A A Aw A d A w w d A T T T T T T T T 2 2             d A A A w d A Aw A T T T T 1     
  • 13.
    04/20/25 1 RBF NetworksM.W. Mak Note that       x A x A x A x x y A y A x x y y x x T T T T                        Problems (1) Susceptible to round-off error. (2) No solution if is singular. (3) If is close to singular, we get very large component in w. A AT A AT
  • 14.
    04/20/25 1 RBF NetworksM.W. Mak Reasons (1) Inaccuracy in forming (2) If A is ill-conditioned, small change in A introduces large change in (3) If AT A is close to singular, dependent columns in AT A exist A AT  1  A AT e.g. two parallel straight lines. x y
  • 15.
    04/20/25 1 RBF NetworksM.W. Mak singular matrix :                    1 0 4 2 2 1 y x If the lines are nearly parallel, they intersect each other at      , i.e.                  0 0 y x                  0 0 y x or So, the magnitude of the solution becomes very large; hence overflow will occur. The effect of the large components can be cancelled out if the machine precision is infinite.
  • 16.
    04/20/25 1 RBF NetworksM.W. Mak If the machine precision is finite, we get large error. For example,                         0 0 10 2 10 4 2 1 2 1 38 38 Finite machine precision =>                            33 33 38 38 10 1 10 1 10 2 10 00001 . 4 2 1 2 1 Solution: Singular Value Decomposition
  • 17.
    04/20/25 1 RBF NetworksM.W. Mak xp K-means K-Nearest Neighbor Basis Functions Linear Regression ci ci i A w  RBF learning process RBF learning process
  • 18.
    04/20/25 1 RBF NetworksM.W. Mak  RBF learning by gradient descent   Let and i p pj ij ij j n p p p x x c e x d x s x ( ) exp ( ) ( ) ( )                    1 2 2 2 1    E e xp p N    1 2 1 2 ( ) .  we have       E w E E c i ij ij , , and Apply
  • 19.
    04/20/25 1 RBF NetworksM.W. Mak we have the following update equations        w t w t e x x i M w t w t e x i t t e x w x x c t c t c t e x w x x c t i i w p i p p N i i w p p N ij ij p i i p pj ij ij p N ij ij c p i i p pj ij ij p N ( ) ( ) ( ) ( ) , , , ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )                           1 1 2 1 0 1 1 1 1 2 3 1 2 1                     when when
  • 20.
    04/20/25 2 RBF NetworksM.W. Mak Elliptical Basis Function networks )} ( ) ( 2 1 exp{ ) ( 1 j p j T j p p j x x x                j  j : function centers : covariance matrix 1 x1 2 M x2 xn     J j p j kj p k x w x y 0 ) ( ) (   y W D     W = + y x 1( )  y x K ( )   
  • 21.
    04/20/25 2 RBF NetworksM.W. Mak  K-means and Sample covariance K-means : if Sample covariance :       j j j x N x j      1  x j       x x j k j k           j j j j T x N x x j      1 (  )(  )         The EM algorithm
  • 22.
    04/20/25 2 RBF NetworksM.W. Mak EBF Vs. RBF networks EBF Vs. RBF networks RBFN with 4 centers EBFN with 4 centers -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 Class1 Class2 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 Class1 Class2
  • 23.
    04/20/25 2 RBF NetworksM.W. Mak Out put 1 of an EBFnet work (bias, no rescale,gamma=1) 'nxor.ebf4.Y.N.1.dat' 1.43 0.948 0.463 -0.0209 -0.505 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 -1 -0.5 0 0.5 1 1.5 2 EBF Network’s output Elliptical Basis Function Networks Elliptical Basis Function Networks
  • 24.
    04/20/25 2 RBF NetworksM.W. Mak RBFN for Pattern Classification MLP RBF Hyperplane Kernel function The probability density function (also called conditional density function or likelihood) of the k-th class is defined as   k C x p | 
  • 25.
    04/20/25 2 RBF NetworksM.W. Mak •According to Bays’ theorem, the posterior prob. is         x p C P C x p x C P k k k    | |  where P(Ck) is the prior prob. and   ) ( ) | ( r r r C P C x p x p     • It is possible to use a common pool of M basis functions, labeled by an index j, to represent all of the class-conditional densities, i.e.   ) | ( ) | ( | 1 k M j k C j P j x p C x p     
  • 26.
    04/20/25 2 RBF NetworksM.W. Mak ) 1 | (x p ) | ( k C x p ) | ( M x p ) 2 | (x p       k M j k C j P j x p C x p | | | 1      ) | ( k C M P
  • 27.
    04/20/25 2 RBF NetworksM.W. Mak         k k M j k C P C j P j x p x p     1 | |                    M j k k k M j j P j x p C P C j P j x p 1 1 | | |                   j P j P j P j x p C P C j P j x p x C P M j M j k k k        1 ' ' 1 | | | |   
  • 28.
    04/20/25 2 RBF NetworksM.W. Mak                                  M j j kj M j k M j M j k k x w x j P j C P j P j x p j P j x p j P C P C j P 1 1 1 ' ' 1 | | | | |     Hidden node’s output posterior prob. of the j-th set of features in the input . weight posterior prob. of class membership, given the presence of the j-th set of features . : ) | ( ) ( x j P x j     : ) | ( j C P w k kj  No bias term
  • 29.
    04/20/25 2 RBF NetworksM.W. Mak RBF networks MLP Learning speed Very Fast Very Slow Convergence Almost guarantee Not guarantee Response time Slow Fast Memory requirement Very large Small Hardware implementation IBM ZISC036 Nestor Ni1000 www-5.ibm.com/fr/cdlab/zisc.html Voice Direct 364 www.sensoryinc.com Generalization Usually better Usually poorer Comparison of RBF and MLP Comparison of RBF and MLP To learn more about NN hardware, see To learn more about NN hardware, see http://www.particle.kth.se/~lindsey/HardwareNNWCourse/home.html http://www.particle.kth.se/~lindsey/HardwareNNWCourse/home.html