SlideShare a Scribd company logo
1 of 7
Clustering of Relational Data Containing Noise and Outliers
Sumit Sen and Rajesh N. Davé
Department of Mechanical Engineering
New Jersey Institute of Technology
University Heights, Newark, NJ 07102-1982

Corresponding author (dave@shiva.njit.edu)
Abstract
The concept of noise clustering (NC) algorithm is
applied to several fuzzy relational data clustering
algorithms to make them more robust against noise and
outliers. The methods considered here include
techniques proposed by Roubens, RFCM of Hathaway,
et al. and FANNY by Kaufman and Rouseeuw. A new
fuzzy relational data clustering (FRC) algorithm is
proposed through generalization of FANNY. The FRC
algorithm, is shown to have the same objective
functional as the RFCM algorithm. However, through
use of direct objective function minimization based on
the Lagrange multiplier technique, the necessary
conditions for minimization are derived without
imposition of the restriction that the relational data is
derived from Euclidean measure of distance from object
data. Robustness of the new algorithm is demonstrated
through several examples.
1. Introduction
Relational data comes from a measure of dissimilarity
(or similarity) between objects, and in some cases it is
actually based on object data. Relational data can also be
based on subjective expert knowledge, see for example
the microcomputer data in Gowda and Diday [7], or
subjective dissimilarity between countries in Kaufman
and Rouseeuw [11]. Fuzzy techniques for clustering
relational data include the methods by Ruspini [13],
Roubens [12], Windham [14], Hathaway et al. [10], and
Kaufman and Rouseeuw [11]. It is claimed in [11] that use
of L1 norm in the FANNY algorithm makes it somewhat
robust against noise. However, for a general form of
relational data, such claim is not valid because L1 norm
may not be used. Hence, in general, these methods are
sensitive to noise and outliers in the data.
Recently, several techniques have been introduced to
increase robustness of algorithms for clustering of object
data, see for example Davé [3], and a review by Davé and
Krishnapurum [4]. Davé [3] proposed the concept of
noise clustering for making the fuzzy c-means (FCM) [1]
and related object data algorithms robust against noise.
Consequently, use of such technique in all the
derivatives of FCM type object data clustering
algorithms would make those algorithms robust against
noise. Based on this observation, Hathaway et al. [9]
have suggested that incorporation of the concept of
noise clustering [3] in their relational dual of fuzzy
c-means (FCM) algorithm, called RFCM, would make it
robust. However, no results or specific algorithm were
presented. In this paper, application of the concept of
noise clustering is considered to specifically address the
problem of robustness in popular relational clustering
techniques. This includes the techniques by Roubens
[12], Hathaway et al. [10] (RFCM) and development of a
generalized version of FANNYalgorithmdue to Kaufman
and Rouseeuw [11] through an approach which is based
on directly converting the original functional to a noise
clustering functional.
2. Noise clustering technique
Noise clustering (NC) was specifically introduced for
making FCM and related algorithms robust. The
following discussion considers NC techniques for the
FCM for object data clustering. In the NC technique
proposed in [3], noise is considered to be a separate
class, and its prototype has the same distance, , from all
the feature vectors. The membership u*j of a data point xj
in the noise cluster is defined as,
u*j = 1 - uij
i
c

1
. (1)
where c is the number of clusters and uij denotes the
grade of membership (belonging) of point xj in the ith
fuzzy subset of X. Since (1) is used to define the
membership u*j in the noise class, the usual membership
constraint of FCM is not required. Thus, the membership
constraint for the good clusters is effectively relaxed to
uij
i
c

1
 1. (2)
This allows noise points to have arbitrarily small
membership values in good clusters. The objective
function is given as
J =    u d x uij
m
j i ij
i
c
j
n
j
n
i
c m
2 2
1111
1,  







 (3)
In (3), d2(xj,i) is the distance from a feature point xj to
the prototype i. The equation for the memberships is
given as,
u
d
d
ij
ij
kjk
C
m
m
m












 











1
1 1
2
2 2
1
1
1
1
1 1
1
( )
( )
( )

(4)
In the above, dij is equivalent to d2(xj,i). The
memberships for FCM do not have the second term in
their denominators, and thus, the NC memberships are
different. In the next section, the concepts of NC
technique are applied to existing relational data clustering
techniques.
3. NC applied to relational clustering
The concept of noise clustering works well for object
data clustering methods such as FCM, as the definition
of the noise distance  has a direct physical meaning. In
object data clustering, there are object prototypes, and
hence there is a noise prototype. The extension of noise
clustering to relational data clustering techniques is not
obvious, because in a strict sense, there are no cluster
(and hence noise) prototypes in relational clustering, and
there is only a need to generate a partition. We consider
Roubens [12] method first. For n objects, the relational
data is usually a n n matrix (if the relational measure
Rij between objects i and j is symmetric, i.e.¸ Rij = Rji then
only the lower triangular portion of n n matrix is
required). Roubens considers the following functional.
F u u RR ik ij jk
k
n
j
n
i
c


 2 2
111
(5)
subject to constraints
u k nik
i
c

  
1
1 1 2, , ,...., (6)
u i c k nik   0 12 1 2, , ,......, ; , ,...., (7)
R R R Rij ii ij ji  0 0, , .and (8)
where there are n objects, and c clusters. This can be
converted to noise clustering by adding a noise class,
thus making the number of clusters c + 1. Then, the new
functional becomes,
N
R ik ij jk
k
n
j
n
i
c
F u u R


 2 2
111
1
(9)
In equation (9), the pre-superscript “N” denotes
extension to noise clustering. In (9) it is not obvious how
to introduce the noise distance, since the noise distance
in [3] is defined as a distance fromthe noise prototype to
the object data point. In relational data, since there is no
explicit object data available, one must modify the
definition of the noise distance. For this purpose, (9) is
rewritten as follows.
N
R ik ij jk i
k
n
j
n
i
c
k j jk
k
n
j
n
F u u R u u R 
 
 2 2
111
2 2
11
( ) ( )* * *
(10)
Equation (10) is used with the membership constraint
from (1) instead of (6), thus explicitly relating the noise
membership to the other memberships. In (10), the first
term on right hand side is same as the original Roubens
functional, while the second term is due to the extension
to noise clustering. Another modification here is the extra
subscript to the dissimilarity distance - (Rjk)i - denoting
that this is the “value” of dissimilarity between objects j
and k as viewed by class i. Normally, the dissimilarity
should be independent of the class, thus Rjk = (Rjk)i for
all i. However, when introducing the noise class, we must
make a distinction that it is a special class and it imposes
its own bias (or lack there of) to determine the “amount”
of dissimilarity. Then analogous to the original noise
clustering, we specify that the noise class views all
dissimilarities as equal. Thus, (Rjk)i = , the dissimilarity
noise distance. This noise distance, can be the same for
all cases, or in a manner similar to the generalized noise
clustering [5], it could take different values for different
pairs of points as well as clusters. In this paper we
restrict this to be a constant value, and thus (10) is
written as,
N
R ik ij jk
k
n
j
n
i
c
k j
k
n
j
n
F u u R u u 
 
 2 2
111
2 2
11
* * 
(11)
An algorithmto minimize (11) follows:
NC version of Roubens Algorithm
1. For relational data satisfying (8), fixc, 2  c  n,
and initialize fuzzy (c+1)-partition, uik. Select
noise distance,  0.
2. Compute terms Dik defined as below
D u Rik ij jk
j
n


 2
1
(12)
and the noise term,
D u Nk j
j
n
nc* * 

 2
1
  , (13)
where Nnc is the equivalent fuzzy cardinality of
noise class, i.e.,
N unc j
j
n
= .*
2
1
 (14)
Note that all these terms are  0.
3. Compute memberships by solving the new
minimization problemthat resembles the original
noise clustering for FCM formulation:
min
* *
u
u D u D
ik
ik ik
k
n
i
c
k k
k
n
2
11
2
1

 
  (15)
to obtain the memberships as
u
D
D N
ik
ik
jk ncj
C












 








1
1 1
1 
(16)
4. Check for termination using a convenient normon
uik and if terminated stop, else go to step 2.
In the above, it is easy to see how the noise distance
appears in the solution procedure. It is noted that the
term D*k in (13) is a product of the noise distance  and
the equivalent fuzzy cardinality of noise class defined in
(14). In step 3, one can also easily compute the
membership in the noise class as shown in (16).
Next we consider the RFCM algorithm of Hathaway et
al. [10]. Since the functional of RFCM is basically a
normalized version of Roubens functional, it can be
extended to noise clustering in a similar way as follows:
N
RFCM
ik
m
ij
m
jk
k
n
j
n
it
m
t
n
i
c k
m
j
m
k
n
j
n
t
m
t
nF
u u R
u
u u
u
 










11
1
1
11
1
2 2
* *
*

(17)
Following the arguments in [10], it is clear that the
derivation requires that the relational data be obtained
from Euclidean distances. Thus besides, constraints
(6)-(8), one more constraint is required as,
R = d x xjk jk j k
2
2
  for j, k = 1, ….., n
(18)
The following equation for the membership vector can be
used for computing the first c vectors Vi,
V u u u ui i i in
T
ik
k
n


( , ,......., )1 2
1
(19)
where the Vi represents a mean (i.e. averaged) unit
vector of memberships for the ith cluster. These are then
used to obtain object to cluster distances, dik, as
following for the first c classes.
d RV V RV R R dik i k i
T
i jk jk
2 2
2   ( ) ( )/ [ ] [ ],with (20)
However, the noise membership vector (membership of
objects to the noise class) is computed as below.
V u u u un
T
k
k
n
* * * * *( , ,......., )

1 2
1
(21)
These are then used to obtain object to noise cluster
distances, d*k, as follows using equation (20).
d RV V RV R Rk k
T
jk* * * * *( ) ( ) / [ ] ,2
2   with  (22)
In (22), j and k are two objects, and index* represents the
noise cluster. Although it may not be apparent, in this
noise cluster extension, the dissimilarity distance - (Rjk)i -
is viewed differently by each class, thus the dissimilarity
distances in (22) are all same as because this equation is
written specifically for the noise class. This can be
simplified to obtain that the object to noise class distance
is directly related to  as
d k* / ,2
2 
(23)
It is clear that (21) is not even required for evaluating
object to noise cluster distances, d*k, as those terms drop
out from(22). Subsequently, the original RFCM equation
for memberships can be modified to obtain memberships
in good classes as
u
d
d
ik
ik
m
wk
m m
w
c












 







 


1
1 2
2
1 1
2
1 1 1 1
1
/ ( )
/ ( ) / ( )

(24)
and the memberships in the noise class is
u
d
k
m
wk
m m
w
c*
/ ( )
/ ( ) /( )












 







 


2
1 2
1 1
2
1 1 1 1
1


(25)
The robust version of the RFCM algorithmis:
NC version of RFCM Algorithm
1. For relational data satisfying (18), fixc, 2  c  n,
and m > 1, and initialize fuzzy (c+1) -partition, uik.
Select noise distance , and compute object to
noise cluster distances, d*k from(23).
2. Compute c mean vectors Vi from(19) and then
compute distances, dik from(20).
3. Update memberships, uik from(24) and the noise
memberships (if required) from(25).
4. Check for termination using a convenient normon
uik and if terminated stop, else go to step 2.
A careful analysis of the above algorithm shows that the
only major difference between this and the original
RFCM is in equation (24), in the second term in the
denominator of left hand side. It may be seen that one
does not require explicit computation of noise
memberships, and thus there is only a minor modification
necessary fromthe old algorithmto the new one, which is
in terms of adding one single term to (24). This indicates
the simplicity of this approach.
The above two algorithms can be coded to check how
well the noise concept works for relational data. Since
Roubens original algorithm is considered to have
stability problems [14], it is not considered here in terms
of results. However, the NC version of RFCM can be
tested on noisy data. This is reported in Section 5. In the
next section, the functional of RFCM from (17) is
considered, and an optimization algorithm based on work
presented in [11] is derived without utilizing the
constraint from (18). It is easy to see that constraint (18)
is much more restrictive as compared to (8).
4. A generalized robust version of FANNY
A fuzzy relational data clustering algorithm called
FANNY (Fuzzy Analysis) [11] considers an objective
functional very similar to original RFCM functional. The
FANNYfunctional is,





 

c
i
n
t
it
n
j
n
k
jkijik
FANNY
u
Ruu
F
1
1
2
1 1
22
2
(26)
with the membership constraints from (6) and (7). In the
above, Rjk is the distance or dissimilarity between objects
j and k, and it is implied [11] to be the L1 distance. The
reader is referred to [11] for details of derivation of an
algorithm that is based on application of Lagrange
multiplier and Kuhn-Tucker conditions to directly
minimize (26) subject to the constraints (6), (7) and (8).
In this paper, we generalize FANNY as follows. The
membership fuzzifying exponent m, is used in (26) along
with Rjk to denote any dissimilarity measure. Based on
that, one obtains a functional shown below that looks
exactly same as the original RFCM functional.
F
u u R
u
FRC
ik
m
ij
m
jk
k
n
j
n
it
m
t
n
i
c







11
1
1
2
(27)
In the above, the subscript FRC stands for Fuzzy
Relational Clustering, which is an extension of FANNY
technique. To reiterate, the difference between the two
are; (a) the fuzzifier m, which makes the fuzzy
memberships more general, and (b) the implication that
while the relational data in FANNYusually comes from L1
norm, in FRC it could be from any dissimilarity measure.
The difference due to the use of the fuzzifier m becomes
an important issue when FRC is made robust using the
concept of noise clustering [3]. Hereafter, the version in
(27) is referred to as FRC. While (27) looks like the
original RFCM, the important difference is that constraint
(8) is used here instead of (18). The FRC is extended to
noise clustering concept by adding a noise cluster, thus
modifying (27) to look like (17) as below.
N
FRC
ik
m
ij
m
jk
k
n
j
n
it
m
t
n
i
c k
m
j
m
k
n
j
n
t
m
t
n
F
u u R
u
u u
u
 










11
1
1
11
1
2 2
* *
*

(28)
To derive the necessary conditions for the
minimization of (28), a Lagrangian is constructed based
on the constraint (1), while the inequality constraint in (7)
is ignored with a hope that it may be automatically
satisfied. This treatment is similar to the derivation of the
original FCM algorithm, where the inequality constraint
was not directly included in the optimization problem.
Thus there are many differences between NC version of
FRC and FANNYin [11]. The Lagrangian is as follows.
L
u u R
u
u u
u
ik
m
ij
m
jk
k
n
j
n
it
m
t
n
i
c k
m
j
m
k
n
j
n
t
m
t
n
 










11
1
1
11
1
2 2
* *
*

 









k jk
j
c
k
n
u 1
1
1
1
(29)
The above can be minimized with respect to uik and
through eliminating the Lagrange multipliers k, one can
obtain the following for the memberships when m > 1.
u
a
a a
ik
ik
m
wk
m
k
m
w
c












 







 


1
1 1
1 1
1 1 1 1
1
/ ( )
/ ( )
*
/ ( )
(30)
where the terms aik are given by,
 
a
m u R
u
m u u R
u
ik
ij
m
jkj
n
ij
m
j
n
ij
m
ih
m
jhj
n
h
n
ijv
m
j
n
 








1
1
11
1
2
2
(31)
Thus by direct application of Lagrange multiplier
technique to derive constrained minimization of (28), we
obtain the solution for the c+1-partition from (30) and
(31). It is noted that in deriving the above, the only
constraint on Rjk has been (8). Thus this derivation has
an advantage over the derivation in RFCM. It should be
clear that (30) can be used to find memberships in good
clusters as well as noise class, while from (31), the
quantity a*k can be obtained by the following simplified
equation,
a mk*  
2
(32)
where one can see a resemblance between (32) and (23).
Substitution of (32) in (30) gives,
u
a
a
m
ik
ik
m
wk
m m
w
c












 







 


1
1 2
1 1
1 1 1 1
1
/ ( )
/ ( ) / ( )

(33)
A few observations regarding (33) and (31) are in order.
First, equation (33) is a transcendental equation in uik,
and second, the constraint (7) is not explicitly satisfied.
To solve for uik from(33), one can use a gradient descent
technique such as Newton’s method, or simply use a
successive substitution method, in which one can
repeatedly use old values of uik in (31) to obtain aik and
then solve for new values of uik from (33) till
convergence. In practice, one can improve the order of
convergence of this method by using the Seidel iteration
scheme, where in solving for aik one utilizes all the new
available membership values. In other words, when
computing the aik, the membership values uij when j < k
are all newly computed (or from current iteration), while
for j  k they are old (or from previous iteration) values.
This is done in the following algorithmfor FRC.
NC version of FRC Algorithm
1. For relational data satisfying (8), fix c, 2  c  n,
and m > 1, and initialize fuzzy c+1-partition, uik.
Select a value of . Initiate a counter p = 0
2. Compute for each k = 1, ….., n,
a) Compute for each i = 1,….., c: aik from
equation (31), using memberships (p+1)
uij
for j < k and (p)
uij for j  k . (here the
pre-superscript is iteration number)
b) Compute membership (p+1)
uik using (33)
3. Check for convergence using some convenient
normon uik and if converged stop, else set p = p +
1, and go to step 2.
As mentioned before, there is no guarantee that
constraint (7) will be satisfied as a result of the above
algorithm. In fact, when any of the aik becomes negative,
then a corresponding uik also becomes negative. Let us
examine (31) to determine the conditions under which aik
are non-negative. Equation (31) is rewritten as follows,
a
m
RV V RVik
i k i
T
i ( ) ( ) / 2
(34)
This reveals that (31) is indeed comparable to the right
hand side of (20) in derivation of RFCM. Hence the
equations (33) and (24) are also equivalent, as the factor
m will drop out in (33). It is noted that in rewriting (31)
as (34), no further assumptions (such as relational data
be derived from Euclidean measure) are necessary.
Therefore, this result points out that although the
condition (18) was required in derivation of RFCM, the
actual algorithmmay not be as restrictive, since the same
equations can be also obtained without requiring (18) as
in FRC derivation shown here. In fact, this may explain
why RFCM works for many non-Euclidean examples as
reported in Bezdek et al. [2]. When the relational data is
derived from Euclidean distance as in (18), then (34)
indicates that aik are indeed related to the Euclidean
distance, because now,
a m dik ik ( )2
(35)
hence, for FRC, if the relational data is Euclidean, it will
automatically satisfy the constraint (7) that the
memberships are positive. However, when the relational
data is non-Euclidean, neither RFCM nor FRC will
automatically satisfy (7). It is noted that the NERFCM
(non-Euclidean RFCM) in [8] solves this problem at an
expense of computation. Further discussion on this can
be found in [6]. The only way to make sure that (7) is
satisfied, one must employ that constraint also in the
minimization procedure. This will be a subject of another
paper. However, this form of noise resistant FRC
algorithm is derived here to (a) obtain a simple relational
clustering algorithm that is based on first principles of
optimization and the concept of noise clustering so that it
is robust against noise, and (2) explain observations in
Bezdek etal. [2] regarding why RFCM worked for many
non-Euclidean examples.
It is further noted that NC-FRC algorithm and
NC-RFCM algorithm are also almost identical except for
the use of Seidel technique utilization in NC-FRC. In the
next section, we include two examples to show how noise
resistant versions of FRC and RFCM behave.
5. Numerical examples
Due to space limitation, only two examples are
considered. While the algorithms presented in this paper
are designed specifically for relational data, use of an
object data is more relevant when one needs to evaluate
how noise points affect the results. We consider a
textbook example [3], consisting of 11 objects in
two-dimensional space, where one is an outlier, and the
remainders are
divided in two
classes. When the
conventional
RFCM algorithm is
applied to create a
2-partition of the
relational data
obtained using
Euclidean norm as
in (18), then the
outlier is seen to
form one cluster,
while the rest of
the points form
another cluster.
This is shown in
Figure 1. The same
example (of
converted
relational data) is used for NC-RFCM or NC-FRC, the
result shown in Fig. 2 are obtained. Fig. 2 shows that the
noise resistant versions can handle biasing outliers in a
better way. Please note that while this example may
appear to be contrived, for examples where the outliers
are evenly distributed around the two original clusters,
the NC-RFCM or NC-FRC does find two correct clusters.
The next example is of real relational data from
Kaufman and Rouseeuw (Table 5, Chapter 2) [11] called
“countries data”
(CD). In this data
set, dissimilarities
between 12
countries (a 12x12
matrix) are
obtained by
averaging the
results of a
survey among
political science
students. Due to
space limitations, the table is not provided. When the
original RFCM or FRC algorithms are used, the results are
similar to that reported in [11] using FANNY. These
results show for a three class partition, USA, Belgium,
France, and Israel as one group (developed countries);
Cuba, China, former USSR, and Yugoslavia as second
group (communist countries); and Brazil, Egypt, India,
and Zaire (developing countries) as third group.
However, further analysis of the fuzzy partition reveals
that Egypt is unlike any of the three typical groups.
When the NC versions were run to find three good
classes, it turns out that Egypt is identified as an outlier,
indicating that it really did not belong well in any of the
three groups. In [11], their silhouette plot (page 176) also
indicated that Egypt was “worst clustered” object. Thus
this example shows that noise resistant versions take
care of outliers.
6. Conclusion
It is shown that the concept of noise clustering can be
applied to fuzzy relational data clustering algorithms.
Several popular algorithms are considered and an attempt
is made to show that use of NC makes the resulting
algorithms more robust against noise and outliers. A
generalized version of FANNY is also proposed that
includes fuzzifier exponent, m, and allows use of any type
of relational data. This version, called Fuzzy relational
Clustering (FRC) algorithm, is shown to have the same
objective functional as the RFCM algorithm. Unlike [10],
through use of direct objective function minimization
based on Lagrange multiplier technique, the necessary
conditions for minimization are derived without
imposition of the restriction that the relational data is
derived from Euclidean measure of distance from object
data. As mentioned before, NERFCM [8] can also achieve
this, see [6] for further discussion. The FRC is also made
robust using NC approach. It is noted here, that the
conditions for minimization presented here do not
automatically satisfy the constraint (8), unless the
relational data is derived based on (18). Hence when the
relational data is non-Euclidean, further improvements in
the FRC algorithm [6] are required. It is expected that this
algorithmwill be applicable to relational data obtained in
diverse applications, including problems in particle
technology.
Acknowledgements: The authors wish to thank the
referee for many editorial comments. Partial support from
New Jersey Commission on Science and Technology is
gratefully acknowledged.
References
[1] J. C. Bezdek, Pattern Recognition with Fuzzy Objective
Function Algorithms. New York: Plenum, 1981.
[2] J. C. Bezdek, R.. J. Hathaway, and M. P. Windham,
“Numerical comparison of the RFCM and AP algorithms
for clustering relational data,” Pattern Recognition, vol. 27,
pp. 429-437, 1997.
[3] R. N. Davé, “Characterization and detection of noise in
clustering,” Patt. Rec. Letter, vol. 12, pp. 657-664, 1991.
Fig. 1. Results of RFCM (“o”
denotes cluster 1, while “x” denotes
cluster 2).
Fig. 2. Results of NC-RFCM (or NC-FRC), showing two
good clusters, while outlier is classified as noise (“+”)
[4] R. N. Davé and R. Krishnapurum, “Robust Clustering
Methods: A Unified View,” IEEE Trans. Fuzzy Systems,
vol. 5, pp. 270-293, May 1997.
[5] R. N. Davé and S. Sen, “On generalizing the noise
clustering algorithm,” in Proc. Seventh International Fuzzy
Systems Association World Congress: IFSA '97, Prague,
Czech Republic, June, 1997, pp. 205-210.
[6] R. N. Davé and S. Sen, “Robust fuzzy clustering of
relational data”, submitted, IEEE T. Fuzzy Systems, 1997.
[7] K. C. Gowda and E. Diday, “Symbolic clustering using a
new similarity measure,” IEEE Trans. System Man.
Cybernetics, vol. 22, pp. 368-378, 1992.
[8] R. J. Hathaway, and J. C. Bezdek, “NERF c-Means:
Non-Euclidean Relational Fuzzy Clustering,” Pattern
Recognition, vol. 27, pp. 429-437, 1994.
[9] R. J. Hathaway, J. C. Bezdek and J.W. Devenport, “On
relational data versions of c-means algorithms,” Pattern
Recognition Letters, vol. 17, pp. 607-612, 1996.
[10] R. J. Hathaway, J.W. Devenport, and J. C. Bezdek,
“Relation duals of the c-means clustering algorithms,”
Pattern Recognition, vol. 22, pp. 205-212, 1988.
[11] L. Kaufman and P. J. Rousseeuw, Finding Groups in
Data: An Introduction to Cluster Analysis. New York:
Wiley, 1990.
[12] M. Roubens, “Pattern classification problems and fuzzy
sets,” Fuzzy Sets and Systems, vol. 1, pp 239-253, 1978.
[13] E. Ruspini, “Numerical methods for fuzzy clustering,”
Information Science, vol. 2, pp. 319-350, 1970.
[14] M. P. Windham, “Numerical classification of proximity
data with assignment measures,” J. of Classification, vol.
2, pp. 157-172, 1985.

More Related Content

What's hot

Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via MeshingDon Sheehy
 
Different techniques for speech recognition
Different  techniques for speech recognitionDifferent  techniques for speech recognition
Different techniques for speech recognitionyashi saxena
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...Masahiro Suzuki
 
Unexpected Default in an Information based model
Unexpected Default in an Information based modelUnexpected Default in an Information based model
Unexpected Default in an Information based modelMatteo Bedini
 
Ch1 representation of signal pg 130
Ch1 representation of signal pg 130Ch1 representation of signal pg 130
Ch1 representation of signal pg 130Prateek Omer
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
A Subspace Method for Blind Channel Estimation in CP-free OFDM Systems
A Subspace Method for Blind Channel Estimation in CP-free OFDM SystemsA Subspace Method for Blind Channel Estimation in CP-free OFDM Systems
A Subspace Method for Blind Channel Estimation in CP-free OFDM SystemsCSCJournals
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureRajesh Piryani
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Taiji Suzuki
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data miningZHAO Sam
 
Lec 07 image enhancement in frequency domain i
Lec 07 image enhancement in frequency domain iLec 07 image enhancement in frequency domain i
Lec 07 image enhancement in frequency domain iAli Hassan
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmPınar Yahşi
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based ClusteringSSA KPI
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methodsKrish_ver2
 
Fast fourier transform
Fast fourier transformFast fourier transform
Fast fourier transformAshraf Khan
 
Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81Prateek Omer
 

What's hot (20)

Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via Meshing
 
Different techniques for speech recognition
Different  techniques for speech recognitionDifferent  techniques for speech recognition
Different techniques for speech recognition
 
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
(DL hacks輪読) How to Train Deep Variational Autoencoders and Probabilistic Lad...
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
45
4545
45
 
Unexpected Default in an Information based model
Unexpected Default in an Information based modelUnexpected Default in an Information based model
Unexpected Default in an Information based model
 
Ch1 representation of signal pg 130
Ch1 representation of signal pg 130Ch1 representation of signal pg 130
Ch1 representation of signal pg 130
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
A Subspace Method for Blind Channel Estimation in CP-free OFDM Systems
A Subspace Method for Blind Channel Estimation in CP-free OFDM SystemsA Subspace Method for Blind Channel Estimation in CP-free OFDM Systems
A Subspace Method for Blind Channel Estimation in CP-free OFDM Systems
 
Optics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structureOptics ordering points to identify the clustering structure
Optics ordering points to identify the clustering structure
 
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
Minimax optimal alternating minimization \\ for kernel nonparametric tensor l...
 
Masters Report 3
Masters Report 3Masters Report 3
Masters Report 3
 
Clustering: Large Databases in data mining
Clustering: Large Databases in data miningClustering: Large Databases in data mining
Clustering: Large Databases in data mining
 
Lec 07 image enhancement in frequency domain i
Lec 07 image enhancement in frequency domain iLec 07 image enhancement in frequency domain i
Lec 07 image enhancement in frequency domain i
 
Unit ii
Unit iiUnit ii
Unit ii
 
DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
 
Fast fourier transform
Fast fourier transformFast fourier transform
Fast fourier transform
 
Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81Ch2 probability and random variables pg 81
Ch2 probability and random variables pg 81
 

Viewers also liked

ELEIÇÕES 2012 - ASTORGA: BRUNO HENRIQUE 12123
ELEIÇÕES 2012 - ASTORGA: BRUNO HENRIQUE 12123ELEIÇÕES 2012 - ASTORGA: BRUNO HENRIQUE 12123
ELEIÇÕES 2012 - ASTORGA: BRUNO HENRIQUE 12123Joao Carlos Passari
 
Kiran Kumari - Resume
Kiran Kumari - ResumeKiran Kumari - Resume
Kiran Kumari - ResumeKiran Kumari
 
2012-02-27-팀포퐁 시스템 변경안
2012-02-27-팀포퐁 시스템 변경안 2012-02-27-팀포퐁 시스템 변경안
2012-02-27-팀포퐁 시스템 변경안 Team POPONG
 
Aicc slides interface
Aicc slides interfaceAicc slides interface
Aicc slides interfaceJerry Arnold
 
Oea e goverment-trabajo-final
Oea e goverment-trabajo-finalOea e goverment-trabajo-final
Oea e goverment-trabajo-finaljin_group
 
Mitos completo
Mitos completoMitos completo
Mitos completo925713086
 
Network Planning and Optimization
Network Planning and OptimizationNetwork Planning and Optimization
Network Planning and OptimizationEM Legacy
 
265570212 ensayo-debilidades-de-la-norma-iso-9126
265570212 ensayo-debilidades-de-la-norma-iso-9126265570212 ensayo-debilidades-de-la-norma-iso-9126
265570212 ensayo-debilidades-de-la-norma-iso-9126Andreita Guevara Trujillo
 

Viewers also liked (16)

Ple
PlePle
Ple
 
Unidad 1
Unidad 1Unidad 1
Unidad 1
 
ELEIÇÕES 2012 - ASTORGA: BRUNO HENRIQUE 12123
ELEIÇÕES 2012 - ASTORGA: BRUNO HENRIQUE 12123ELEIÇÕES 2012 - ASTORGA: BRUNO HENRIQUE 12123
ELEIÇÕES 2012 - ASTORGA: BRUNO HENRIQUE 12123
 
Kiran Kumari - Resume
Kiran Kumari - ResumeKiran Kumari - Resume
Kiran Kumari - Resume
 
Las apps
Las appsLas apps
Las apps
 
2012-02-27-팀포퐁 시스템 변경안
2012-02-27-팀포퐁 시스템 변경안 2012-02-27-팀포퐁 시스템 변경안
2012-02-27-팀포퐁 시스템 변경안
 
Edward CV 3
Edward CV 3Edward CV 3
Edward CV 3
 
Aicc slides p8
Aicc slides p8Aicc slides p8
Aicc slides p8
 
Aicc slides interface
Aicc slides interfaceAicc slides interface
Aicc slides interface
 
Aicc slides p1
Aicc slides p1Aicc slides p1
Aicc slides p1
 
Oea e goverment-trabajo-final
Oea e goverment-trabajo-finalOea e goverment-trabajo-final
Oea e goverment-trabajo-final
 
Identity Package
Identity PackageIdentity Package
Identity Package
 
Mitos completo
Mitos completoMitos completo
Mitos completo
 
Mapa conceptual ishikawa
Mapa conceptual ishikawaMapa conceptual ishikawa
Mapa conceptual ishikawa
 
Network Planning and Optimization
Network Planning and OptimizationNetwork Planning and Optimization
Network Planning and Optimization
 
265570212 ensayo-debilidades-de-la-norma-iso-9126
265570212 ensayo-debilidades-de-la-norma-iso-9126265570212 ensayo-debilidades-de-la-norma-iso-9126
265570212 ensayo-debilidades-de-la-norma-iso-9126
 

Similar to Clustering Relational Data with Noise and Outliers

Iterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderIterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderCSCJournals
 
Iterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderIterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderCSCJournals
 
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...sipij
 
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...Polytechnique Montreal
 
A Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet TransformA Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet Transformijsrd.com
 
AUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGES
AUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGESAUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGES
AUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGEScscpconf
 
AUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGES
AUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGESAUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGES
AUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGEScsitconf
 
Application of Fuzzy Algebra in Coding Theory
Application of Fuzzy Algebra in Coding TheoryApplication of Fuzzy Algebra in Coding Theory
Application of Fuzzy Algebra in Coding TheoryIJERA Editor
 
Application of Fuzzy Algebra in Coding Theory
Application of Fuzzy Algebra in Coding TheoryApplication of Fuzzy Algebra in Coding Theory
Application of Fuzzy Algebra in Coding TheoryIJERA Editor
 
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...ijistjournal
 
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...ijistjournal
 
cs.ds-2211.13454.pdf
cs.ds-2211.13454.pdfcs.ds-2211.13454.pdf
cs.ds-2211.13454.pdfssuser866937
 
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...CSCJournals
 
Interferogram Filtering Using Gaussians Scale Mixtures in Steerable Wavelet D...
Interferogram Filtering Using Gaussians Scale Mixtures in Steerable Wavelet D...Interferogram Filtering Using Gaussians Scale Mixtures in Steerable Wavelet D...
Interferogram Filtering Using Gaussians Scale Mixtures in Steerable Wavelet D...CSCJournals
 
On the approximation of the sum of lognormals by a log skew normal distribution
On the approximation of the sum of lognormals by a log skew normal distributionOn the approximation of the sum of lognormals by a log skew normal distribution
On the approximation of the sum of lognormals by a log skew normal distributionIJCNCJournal
 
Paper id 24201464
Paper id 24201464Paper id 24201464
Paper id 24201464IJRAT
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceinventy
 

Similar to Clustering Relational Data with Noise and Outliers (20)

www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Iterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderIterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO Decoder
 
Iterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO DecoderIterative Soft Decision Based Complex K-best MIMO Decoder
Iterative Soft Decision Based Complex K-best MIMO Decoder
 
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
ALEXANDER FRACTIONAL INTEGRAL FILTERING OF WAVELET COEFFICIENTS FOR IMAGE DEN...
 
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
Using Subspace Pursuit Algorithm to Improve Performance of the Distributed Co...
 
A Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet TransformA Review on Image Denoising using Wavelet Transform
A Review on Image Denoising using Wavelet Transform
 
AUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGES
AUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGESAUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGES
AUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGES
 
AUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGES
AUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGESAUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGES
AUTOMATIC THRESHOLDING TECHNIQUES FOR SAR IMAGES
 
Application of Fuzzy Algebra in Coding Theory
Application of Fuzzy Algebra in Coding TheoryApplication of Fuzzy Algebra in Coding Theory
Application of Fuzzy Algebra in Coding Theory
 
Application of Fuzzy Algebra in Coding Theory
Application of Fuzzy Algebra in Coding TheoryApplication of Fuzzy Algebra in Coding Theory
Application of Fuzzy Algebra in Coding Theory
 
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...
 
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...
IMPROVEMENT OF BM3D ALGORITHM AND EMPLOYMENT TO SATELLITE AND CFA IMAGES DENO...
 
Dycops2019
Dycops2019 Dycops2019
Dycops2019
 
cs.ds-2211.13454.pdf
cs.ds-2211.13454.pdfcs.ds-2211.13454.pdf
cs.ds-2211.13454.pdf
 
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...
Speech Processing in Stressing Co-Channel Interference Using the Wigner Distr...
 
Interferogram Filtering Using Gaussians Scale Mixtures in Steerable Wavelet D...
Interferogram Filtering Using Gaussians Scale Mixtures in Steerable Wavelet D...Interferogram Filtering Using Gaussians Scale Mixtures in Steerable Wavelet D...
Interferogram Filtering Using Gaussians Scale Mixtures in Steerable Wavelet D...
 
Wavelet
WaveletWavelet
Wavelet
 
On the approximation of the sum of lognormals by a log skew normal distribution
On the approximation of the sum of lognormals by a log skew normal distributionOn the approximation of the sum of lognormals by a log skew normal distribution
On the approximation of the sum of lognormals by a log skew normal distribution
 
Paper id 24201464
Paper id 24201464Paper id 24201464
Paper id 24201464
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 

Clustering Relational Data with Noise and Outliers

  • 1. Clustering of Relational Data Containing Noise and Outliers Sumit Sen and Rajesh N. Davé Department of Mechanical Engineering New Jersey Institute of Technology University Heights, Newark, NJ 07102-1982  Corresponding author (dave@shiva.njit.edu) Abstract The concept of noise clustering (NC) algorithm is applied to several fuzzy relational data clustering algorithms to make them more robust against noise and outliers. The methods considered here include techniques proposed by Roubens, RFCM of Hathaway, et al. and FANNY by Kaufman and Rouseeuw. A new fuzzy relational data clustering (FRC) algorithm is proposed through generalization of FANNY. The FRC algorithm, is shown to have the same objective functional as the RFCM algorithm. However, through use of direct objective function minimization based on the Lagrange multiplier technique, the necessary conditions for minimization are derived without imposition of the restriction that the relational data is derived from Euclidean measure of distance from object data. Robustness of the new algorithm is demonstrated through several examples. 1. Introduction Relational data comes from a measure of dissimilarity (or similarity) between objects, and in some cases it is actually based on object data. Relational data can also be based on subjective expert knowledge, see for example the microcomputer data in Gowda and Diday [7], or subjective dissimilarity between countries in Kaufman and Rouseeuw [11]. Fuzzy techniques for clustering relational data include the methods by Ruspini [13], Roubens [12], Windham [14], Hathaway et al. [10], and Kaufman and Rouseeuw [11]. It is claimed in [11] that use of L1 norm in the FANNY algorithm makes it somewhat robust against noise. However, for a general form of relational data, such claim is not valid because L1 norm may not be used. Hence, in general, these methods are sensitive to noise and outliers in the data. Recently, several techniques have been introduced to increase robustness of algorithms for clustering of object data, see for example Davé [3], and a review by Davé and Krishnapurum [4]. Davé [3] proposed the concept of noise clustering for making the fuzzy c-means (FCM) [1] and related object data algorithms robust against noise. Consequently, use of such technique in all the derivatives of FCM type object data clustering algorithms would make those algorithms robust against noise. Based on this observation, Hathaway et al. [9] have suggested that incorporation of the concept of noise clustering [3] in their relational dual of fuzzy c-means (FCM) algorithm, called RFCM, would make it robust. However, no results or specific algorithm were presented. In this paper, application of the concept of noise clustering is considered to specifically address the problem of robustness in popular relational clustering techniques. This includes the techniques by Roubens [12], Hathaway et al. [10] (RFCM) and development of a generalized version of FANNYalgorithmdue to Kaufman and Rouseeuw [11] through an approach which is based on directly converting the original functional to a noise clustering functional. 2. Noise clustering technique Noise clustering (NC) was specifically introduced for making FCM and related algorithms robust. The following discussion considers NC techniques for the FCM for object data clustering. In the NC technique proposed in [3], noise is considered to be a separate class, and its prototype has the same distance, , from all the feature vectors. The membership u*j of a data point xj in the noise cluster is defined as, u*j = 1 - uij i c  1 . (1) where c is the number of clusters and uij denotes the grade of membership (belonging) of point xj in the ith fuzzy subset of X. Since (1) is used to define the membership u*j in the noise class, the usual membership constraint of FCM is not required. Thus, the membership constraint for the good clusters is effectively relaxed to uij i c  1  1. (2)
  • 2. This allows noise points to have arbitrarily small membership values in good clusters. The objective function is given as J =    u d x uij m j i ij i c j n j n i c m 2 2 1111 1,           (3) In (3), d2(xj,i) is the distance from a feature point xj to the prototype i. The equation for the memberships is given as, u d d ij ij kjk C m m m                          1 1 1 2 2 2 1 1 1 1 1 1 1 ( ) ( ) ( )  (4) In the above, dij is equivalent to d2(xj,i). The memberships for FCM do not have the second term in their denominators, and thus, the NC memberships are different. In the next section, the concepts of NC technique are applied to existing relational data clustering techniques. 3. NC applied to relational clustering The concept of noise clustering works well for object data clustering methods such as FCM, as the definition of the noise distance  has a direct physical meaning. In object data clustering, there are object prototypes, and hence there is a noise prototype. The extension of noise clustering to relational data clustering techniques is not obvious, because in a strict sense, there are no cluster (and hence noise) prototypes in relational clustering, and there is only a need to generate a partition. We consider Roubens [12] method first. For n objects, the relational data is usually a n n matrix (if the relational measure Rij between objects i and j is symmetric, i.e.¸ Rij = Rji then only the lower triangular portion of n n matrix is required). Roubens considers the following functional. F u u RR ik ij jk k n j n i c    2 2 111 (5) subject to constraints u k nik i c     1 1 1 2, , ,...., (6) u i c k nik   0 12 1 2, , ,......, ; , ,...., (7) R R R Rij ii ij ji  0 0, , .and (8) where there are n objects, and c clusters. This can be converted to noise clustering by adding a noise class, thus making the number of clusters c + 1. Then, the new functional becomes, N R ik ij jk k n j n i c F u u R    2 2 111 1 (9) In equation (9), the pre-superscript “N” denotes extension to noise clustering. In (9) it is not obvious how to introduce the noise distance, since the noise distance in [3] is defined as a distance fromthe noise prototype to the object data point. In relational data, since there is no explicit object data available, one must modify the definition of the noise distance. For this purpose, (9) is rewritten as follows. N R ik ij jk i k n j n i c k j jk k n j n F u u R u u R     2 2 111 2 2 11 ( ) ( )* * * (10) Equation (10) is used with the membership constraint from (1) instead of (6), thus explicitly relating the noise membership to the other memberships. In (10), the first term on right hand side is same as the original Roubens functional, while the second term is due to the extension to noise clustering. Another modification here is the extra subscript to the dissimilarity distance - (Rjk)i - denoting that this is the “value” of dissimilarity between objects j and k as viewed by class i. Normally, the dissimilarity should be independent of the class, thus Rjk = (Rjk)i for all i. However, when introducing the noise class, we must make a distinction that it is a special class and it imposes its own bias (or lack there of) to determine the “amount” of dissimilarity. Then analogous to the original noise clustering, we specify that the noise class views all dissimilarities as equal. Thus, (Rjk)i = , the dissimilarity noise distance. This noise distance, can be the same for all cases, or in a manner similar to the generalized noise clustering [5], it could take different values for different pairs of points as well as clusters. In this paper we restrict this to be a constant value, and thus (10) is written as, N R ik ij jk k n j n i c k j k n j n F u u R u u     2 2 111 2 2 11 * *  (11) An algorithmto minimize (11) follows: NC version of Roubens Algorithm 1. For relational data satisfying (8), fixc, 2  c  n, and initialize fuzzy (c+1)-partition, uik. Select noise distance,  0. 2. Compute terms Dik defined as below D u Rik ij jk j n    2 1 (12) and the noise term, D u Nk j j n nc* *    2 1   , (13)
  • 3. where Nnc is the equivalent fuzzy cardinality of noise class, i.e., N unc j j n = .* 2 1  (14) Note that all these terms are  0. 3. Compute memberships by solving the new minimization problemthat resembles the original noise clustering for FCM formulation: min * * u u D u D ik ik ik k n i c k k k n 2 11 2 1      (15) to obtain the memberships as u D D N ik ik jk ncj C                       1 1 1 1  (16) 4. Check for termination using a convenient normon uik and if terminated stop, else go to step 2. In the above, it is easy to see how the noise distance appears in the solution procedure. It is noted that the term D*k in (13) is a product of the noise distance  and the equivalent fuzzy cardinality of noise class defined in (14). In step 3, one can also easily compute the membership in the noise class as shown in (16). Next we consider the RFCM algorithm of Hathaway et al. [10]. Since the functional of RFCM is basically a normalized version of Roubens functional, it can be extended to noise clustering in a similar way as follows: N RFCM ik m ij m jk k n j n it m t n i c k m j m k n j n t m t nF u u R u u u u             11 1 1 11 1 2 2 * * *  (17) Following the arguments in [10], it is clear that the derivation requires that the relational data be obtained from Euclidean distances. Thus besides, constraints (6)-(8), one more constraint is required as, R = d x xjk jk j k 2 2   for j, k = 1, ….., n (18) The following equation for the membership vector can be used for computing the first c vectors Vi, V u u u ui i i in T ik k n   ( , ,......., )1 2 1 (19) where the Vi represents a mean (i.e. averaged) unit vector of memberships for the ith cluster. These are then used to obtain object to cluster distances, dik, as following for the first c classes. d RV V RV R R dik i k i T i jk jk 2 2 2   ( ) ( )/ [ ] [ ],with (20) However, the noise membership vector (membership of objects to the noise class) is computed as below. V u u u un T k k n * * * * *( , ,......., )  1 2 1 (21) These are then used to obtain object to noise cluster distances, d*k, as follows using equation (20). d RV V RV R Rk k T jk* * * * *( ) ( ) / [ ] ,2 2   with  (22) In (22), j and k are two objects, and index* represents the noise cluster. Although it may not be apparent, in this noise cluster extension, the dissimilarity distance - (Rjk)i - is viewed differently by each class, thus the dissimilarity distances in (22) are all same as because this equation is written specifically for the noise class. This can be simplified to obtain that the object to noise class distance is directly related to  as d k* / ,2 2  (23) It is clear that (21) is not even required for evaluating object to noise cluster distances, d*k, as those terms drop out from(22). Subsequently, the original RFCM equation for memberships can be modified to obtain memberships in good classes as u d d ik ik m wk m m w c                          1 1 2 2 1 1 2 1 1 1 1 1 / ( ) / ( ) / ( )  (24) and the memberships in the noise class is u d k m wk m m w c* / ( ) / ( ) /( )                          2 1 2 1 1 2 1 1 1 1 1   (25) The robust version of the RFCM algorithmis: NC version of RFCM Algorithm 1. For relational data satisfying (18), fixc, 2  c  n, and m > 1, and initialize fuzzy (c+1) -partition, uik. Select noise distance , and compute object to noise cluster distances, d*k from(23). 2. Compute c mean vectors Vi from(19) and then compute distances, dik from(20). 3. Update memberships, uik from(24) and the noise memberships (if required) from(25). 4. Check for termination using a convenient normon uik and if terminated stop, else go to step 2. A careful analysis of the above algorithm shows that the only major difference between this and the original
  • 4. RFCM is in equation (24), in the second term in the denominator of left hand side. It may be seen that one does not require explicit computation of noise memberships, and thus there is only a minor modification necessary fromthe old algorithmto the new one, which is in terms of adding one single term to (24). This indicates the simplicity of this approach. The above two algorithms can be coded to check how well the noise concept works for relational data. Since Roubens original algorithm is considered to have stability problems [14], it is not considered here in terms of results. However, the NC version of RFCM can be tested on noisy data. This is reported in Section 5. In the next section, the functional of RFCM from (17) is considered, and an optimization algorithm based on work presented in [11] is derived without utilizing the constraint from (18). It is easy to see that constraint (18) is much more restrictive as compared to (8). 4. A generalized robust version of FANNY A fuzzy relational data clustering algorithm called FANNY (Fuzzy Analysis) [11] considers an objective functional very similar to original RFCM functional. The FANNYfunctional is,         c i n t it n j n k jkijik FANNY u Ruu F 1 1 2 1 1 22 2 (26) with the membership constraints from (6) and (7). In the above, Rjk is the distance or dissimilarity between objects j and k, and it is implied [11] to be the L1 distance. The reader is referred to [11] for details of derivation of an algorithm that is based on application of Lagrange multiplier and Kuhn-Tucker conditions to directly minimize (26) subject to the constraints (6), (7) and (8). In this paper, we generalize FANNY as follows. The membership fuzzifying exponent m, is used in (26) along with Rjk to denote any dissimilarity measure. Based on that, one obtains a functional shown below that looks exactly same as the original RFCM functional. F u u R u FRC ik m ij m jk k n j n it m t n i c        11 1 1 2 (27) In the above, the subscript FRC stands for Fuzzy Relational Clustering, which is an extension of FANNY technique. To reiterate, the difference between the two are; (a) the fuzzifier m, which makes the fuzzy memberships more general, and (b) the implication that while the relational data in FANNYusually comes from L1 norm, in FRC it could be from any dissimilarity measure. The difference due to the use of the fuzzifier m becomes an important issue when FRC is made robust using the concept of noise clustering [3]. Hereafter, the version in (27) is referred to as FRC. While (27) looks like the original RFCM, the important difference is that constraint (8) is used here instead of (18). The FRC is extended to noise clustering concept by adding a noise cluster, thus modifying (27) to look like (17) as below. N FRC ik m ij m jk k n j n it m t n i c k m j m k n j n t m t n F u u R u u u u             11 1 1 11 1 2 2 * * *  (28) To derive the necessary conditions for the minimization of (28), a Lagrangian is constructed based on the constraint (1), while the inequality constraint in (7) is ignored with a hope that it may be automatically satisfied. This treatment is similar to the derivation of the original FCM algorithm, where the inequality constraint was not directly included in the optimization problem. Thus there are many differences between NC version of FRC and FANNYin [11]. The Lagrangian is as follows. L u u R u u u u ik m ij m jk k n j n it m t n i c k m j m k n j n t m t n             11 1 1 11 1 2 2 * * *             k jk j c k n u 1 1 1 1 (29) The above can be minimized with respect to uik and through eliminating the Lagrange multipliers k, one can obtain the following for the memberships when m > 1. u a a a ik ik m wk m k m w c                          1 1 1 1 1 1 1 1 1 1 / ( ) / ( ) * / ( ) (30) where the terms aik are given by,   a m u R u m u u R u ik ij m jkj n ij m j n ij m ih m jhj n h n ijv m j n           1 1 11 1 2 2 (31) Thus by direct application of Lagrange multiplier technique to derive constrained minimization of (28), we obtain the solution for the c+1-partition from (30) and (31). It is noted that in deriving the above, the only constraint on Rjk has been (8). Thus this derivation has an advantage over the derivation in RFCM. It should be
  • 5. clear that (30) can be used to find memberships in good clusters as well as noise class, while from (31), the quantity a*k can be obtained by the following simplified equation, a mk*   2 (32) where one can see a resemblance between (32) and (23). Substitution of (32) in (30) gives, u a a m ik ik m wk m m w c                          1 1 2 1 1 1 1 1 1 1 / ( ) / ( ) / ( )  (33) A few observations regarding (33) and (31) are in order. First, equation (33) is a transcendental equation in uik, and second, the constraint (7) is not explicitly satisfied. To solve for uik from(33), one can use a gradient descent technique such as Newton’s method, or simply use a successive substitution method, in which one can repeatedly use old values of uik in (31) to obtain aik and then solve for new values of uik from (33) till convergence. In practice, one can improve the order of convergence of this method by using the Seidel iteration scheme, where in solving for aik one utilizes all the new available membership values. In other words, when computing the aik, the membership values uij when j < k are all newly computed (or from current iteration), while for j  k they are old (or from previous iteration) values. This is done in the following algorithmfor FRC. NC version of FRC Algorithm 1. For relational data satisfying (8), fix c, 2  c  n, and m > 1, and initialize fuzzy c+1-partition, uik. Select a value of . Initiate a counter p = 0 2. Compute for each k = 1, ….., n, a) Compute for each i = 1,….., c: aik from equation (31), using memberships (p+1) uij for j < k and (p) uij for j  k . (here the pre-superscript is iteration number) b) Compute membership (p+1) uik using (33) 3. Check for convergence using some convenient normon uik and if converged stop, else set p = p + 1, and go to step 2. As mentioned before, there is no guarantee that constraint (7) will be satisfied as a result of the above algorithm. In fact, when any of the aik becomes negative, then a corresponding uik also becomes negative. Let us examine (31) to determine the conditions under which aik are non-negative. Equation (31) is rewritten as follows, a m RV V RVik i k i T i ( ) ( ) / 2 (34) This reveals that (31) is indeed comparable to the right hand side of (20) in derivation of RFCM. Hence the equations (33) and (24) are also equivalent, as the factor m will drop out in (33). It is noted that in rewriting (31) as (34), no further assumptions (such as relational data be derived from Euclidean measure) are necessary. Therefore, this result points out that although the condition (18) was required in derivation of RFCM, the actual algorithmmay not be as restrictive, since the same equations can be also obtained without requiring (18) as in FRC derivation shown here. In fact, this may explain why RFCM works for many non-Euclidean examples as reported in Bezdek et al. [2]. When the relational data is derived from Euclidean distance as in (18), then (34) indicates that aik are indeed related to the Euclidean distance, because now, a m dik ik ( )2 (35) hence, for FRC, if the relational data is Euclidean, it will automatically satisfy the constraint (7) that the memberships are positive. However, when the relational data is non-Euclidean, neither RFCM nor FRC will automatically satisfy (7). It is noted that the NERFCM (non-Euclidean RFCM) in [8] solves this problem at an expense of computation. Further discussion on this can be found in [6]. The only way to make sure that (7) is satisfied, one must employ that constraint also in the minimization procedure. This will be a subject of another paper. However, this form of noise resistant FRC algorithm is derived here to (a) obtain a simple relational clustering algorithm that is based on first principles of optimization and the concept of noise clustering so that it is robust against noise, and (2) explain observations in Bezdek etal. [2] regarding why RFCM worked for many non-Euclidean examples. It is further noted that NC-FRC algorithm and NC-RFCM algorithm are also almost identical except for the use of Seidel technique utilization in NC-FRC. In the next section, we include two examples to show how noise resistant versions of FRC and RFCM behave. 5. Numerical examples Due to space limitation, only two examples are considered. While the algorithms presented in this paper are designed specifically for relational data, use of an object data is more relevant when one needs to evaluate how noise points affect the results. We consider a textbook example [3], consisting of 11 objects in two-dimensional space, where one is an outlier, and the
  • 6. remainders are divided in two classes. When the conventional RFCM algorithm is applied to create a 2-partition of the relational data obtained using Euclidean norm as in (18), then the outlier is seen to form one cluster, while the rest of the points form another cluster. This is shown in Figure 1. The same example (of converted relational data) is used for NC-RFCM or NC-FRC, the result shown in Fig. 2 are obtained. Fig. 2 shows that the noise resistant versions can handle biasing outliers in a better way. Please note that while this example may appear to be contrived, for examples where the outliers are evenly distributed around the two original clusters, the NC-RFCM or NC-FRC does find two correct clusters. The next example is of real relational data from Kaufman and Rouseeuw (Table 5, Chapter 2) [11] called “countries data” (CD). In this data set, dissimilarities between 12 countries (a 12x12 matrix) are obtained by averaging the results of a survey among political science students. Due to space limitations, the table is not provided. When the original RFCM or FRC algorithms are used, the results are similar to that reported in [11] using FANNY. These results show for a three class partition, USA, Belgium, France, and Israel as one group (developed countries); Cuba, China, former USSR, and Yugoslavia as second group (communist countries); and Brazil, Egypt, India, and Zaire (developing countries) as third group. However, further analysis of the fuzzy partition reveals that Egypt is unlike any of the three typical groups. When the NC versions were run to find three good classes, it turns out that Egypt is identified as an outlier, indicating that it really did not belong well in any of the three groups. In [11], their silhouette plot (page 176) also indicated that Egypt was “worst clustered” object. Thus this example shows that noise resistant versions take care of outliers. 6. Conclusion It is shown that the concept of noise clustering can be applied to fuzzy relational data clustering algorithms. Several popular algorithms are considered and an attempt is made to show that use of NC makes the resulting algorithms more robust against noise and outliers. A generalized version of FANNY is also proposed that includes fuzzifier exponent, m, and allows use of any type of relational data. This version, called Fuzzy relational Clustering (FRC) algorithm, is shown to have the same objective functional as the RFCM algorithm. Unlike [10], through use of direct objective function minimization based on Lagrange multiplier technique, the necessary conditions for minimization are derived without imposition of the restriction that the relational data is derived from Euclidean measure of distance from object data. As mentioned before, NERFCM [8] can also achieve this, see [6] for further discussion. The FRC is also made robust using NC approach. It is noted here, that the conditions for minimization presented here do not automatically satisfy the constraint (8), unless the relational data is derived based on (18). Hence when the relational data is non-Euclidean, further improvements in the FRC algorithm [6] are required. It is expected that this algorithmwill be applicable to relational data obtained in diverse applications, including problems in particle technology. Acknowledgements: The authors wish to thank the referee for many editorial comments. Partial support from New Jersey Commission on Science and Technology is gratefully acknowledged. References [1] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. New York: Plenum, 1981. [2] J. C. Bezdek, R.. J. Hathaway, and M. P. Windham, “Numerical comparison of the RFCM and AP algorithms for clustering relational data,” Pattern Recognition, vol. 27, pp. 429-437, 1997. [3] R. N. Davé, “Characterization and detection of noise in clustering,” Patt. Rec. Letter, vol. 12, pp. 657-664, 1991. Fig. 1. Results of RFCM (“o” denotes cluster 1, while “x” denotes cluster 2). Fig. 2. Results of NC-RFCM (or NC-FRC), showing two good clusters, while outlier is classified as noise (“+”)
  • 7. [4] R. N. Davé and R. Krishnapurum, “Robust Clustering Methods: A Unified View,” IEEE Trans. Fuzzy Systems, vol. 5, pp. 270-293, May 1997. [5] R. N. Davé and S. Sen, “On generalizing the noise clustering algorithm,” in Proc. Seventh International Fuzzy Systems Association World Congress: IFSA '97, Prague, Czech Republic, June, 1997, pp. 205-210. [6] R. N. Davé and S. Sen, “Robust fuzzy clustering of relational data”, submitted, IEEE T. Fuzzy Systems, 1997. [7] K. C. Gowda and E. Diday, “Symbolic clustering using a new similarity measure,” IEEE Trans. System Man. Cybernetics, vol. 22, pp. 368-378, 1992. [8] R. J. Hathaway, and J. C. Bezdek, “NERF c-Means: Non-Euclidean Relational Fuzzy Clustering,” Pattern Recognition, vol. 27, pp. 429-437, 1994. [9] R. J. Hathaway, J. C. Bezdek and J.W. Devenport, “On relational data versions of c-means algorithms,” Pattern Recognition Letters, vol. 17, pp. 607-612, 1996. [10] R. J. Hathaway, J.W. Devenport, and J. C. Bezdek, “Relation duals of the c-means clustering algorithms,” Pattern Recognition, vol. 22, pp. 205-212, 1988. [11] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. New York: Wiley, 1990. [12] M. Roubens, “Pattern classification problems and fuzzy sets,” Fuzzy Sets and Systems, vol. 1, pp 239-253, 1978. [13] E. Ruspini, “Numerical methods for fuzzy clustering,” Information Science, vol. 2, pp. 319-350, 1970. [14] M. P. Windham, “Numerical classification of proximity data with assignment measures,” J. of Classification, vol. 2, pp. 157-172, 1985.