6. Images/cinvestav-
Therefore
It is possible to give a mathematical deļ¬nition of two important types
of clustering
Partitional Clustering
Hierarchical Clustering
Given
Given a set of input patterns X = {x1, x2, ..., xN }, where
xj = (x1j, x2j, ..., xdj)T
ā Rd, with each measure xij called a feature
(attribute, dimension, or variable).
5 / 37
7. Images/cinvestav-
Therefore
It is possible to give a mathematical deļ¬nition of two important types
of clustering
Partitional Clustering
Hierarchical Clustering
Given
Given a set of input patterns X = {x1, x2, ..., xN }, where
xj = (x1j, x2j, ..., xdj)T
ā Rd, with each measure xij called a feature
(attribute, dimension, or variable).
5 / 37
8. Images/cinvestav-
Therefore
It is possible to give a mathematical deļ¬nition of two important types
of clustering
Partitional Clustering
Hierarchical Clustering
Given
Given a set of input patterns X = {x1, x2, ..., xN }, where
xj = (x1j, x2j, ..., xdj)T
ā Rd, with each measure xij called a feature
(attribute, dimension, or variable).
5 / 37
17. Images/cinvestav-
Outline
1 Introduction
Introduction
2 Features
Types of Features
3 Similarity and Dissimilarity Measures
Basic Deļ¬nition
4 Proximity Measures between Two Points
Real-Valued Vectors
Dissimilarity Measures
Similarity Measures
Discrete-Valued Vectors
Dissimilarity Measures
Similarity Measures
5 Other Examples
Between Sets
9 / 37
18. Images/cinvestav-
Features
Usually
A data object is described by a set of features or variables, usually
represented as a multidimensional vector.
Thus
We tend to collect all that information as a pattern matrix of
dimensionality N Ć d.
Then
A feature can be classiļ¬ed as continuous, discrete, or binary.
10 / 37
19. Images/cinvestav-
Features
Usually
A data object is described by a set of features or variables, usually
represented as a multidimensional vector.
Thus
We tend to collect all that information as a pattern matrix of
dimensionality N Ć d.
Then
A feature can be classiļ¬ed as continuous, discrete, or binary.
10 / 37
20. Images/cinvestav-
Features
Usually
A data object is described by a set of features or variables, usually
represented as a multidimensional vector.
Thus
We tend to collect all that information as a pattern matrix of
dimensionality N Ć d.
Then
A feature can be classiļ¬ed as continuous, discrete, or binary.
10 / 37
21. Images/cinvestav-
Thus
Continuous
A continuous feature takes values from an uncountably inļ¬nite range set.
Discrete
Discrete features only have ļ¬nite, or at most, a countably inļ¬nite number
of values.
Binary
Binary or dichotomous features are a special case of discrete features when
they have exactly two values.
11 / 37
22. Images/cinvestav-
Thus
Continuous
A continuous feature takes values from an uncountably inļ¬nite range set.
Discrete
Discrete features only have ļ¬nite, or at most, a countably inļ¬nite number
of values.
Binary
Binary or dichotomous features are a special case of discrete features when
they have exactly two values.
11 / 37
23. Images/cinvestav-
Thus
Continuous
A continuous feature takes values from an uncountably inļ¬nite range set.
Discrete
Discrete features only have ļ¬nite, or at most, a countably inļ¬nite number
of values.
Binary
Binary or dichotomous features are a special case of discrete features when
they have exactly two values.
11 / 37
26. Images/cinvestav-
Measurement Level
Nominal
Features at this level are represented with labels, states, or names.
The numbers are meaningless in any mathematical sense and no
mathematical calculation can be made.
For example, peristalsis ā {hypermotile, normal, hypomotile, absent}.
Ordinal
Features at this level are also names, but with a certain order implied.
However, the diļ¬erence between the values is again meaningless.
For example, abdominal distension ā {slight, moderate, high}.
13 / 37
27. Images/cinvestav-
Measurement Level
Nominal
Features at this level are represented with labels, states, or names.
The numbers are meaningless in any mathematical sense and no
mathematical calculation can be made.
For example, peristalsis ā {hypermotile, normal, hypomotile, absent}.
Ordinal
Features at this level are also names, but with a certain order implied.
However, the diļ¬erence between the values is again meaningless.
For example, abdominal distension ā {slight, moderate, high}.
13 / 37
28. Images/cinvestav-
Measurement Level
Nominal
Features at this level are represented with labels, states, or names.
The numbers are meaningless in any mathematical sense and no
mathematical calculation can be made.
For example, peristalsis ā {hypermotile, normal, hypomotile, absent}.
Ordinal
Features at this level are also names, but with a certain order implied.
However, the diļ¬erence between the values is again meaningless.
For example, abdominal distension ā {slight, moderate, high}.
13 / 37
29. Images/cinvestav-
Measurement Level
Nominal
Features at this level are represented with labels, states, or names.
The numbers are meaningless in any mathematical sense and no
mathematical calculation can be made.
For example, peristalsis ā {hypermotile, normal, hypomotile, absent}.
Ordinal
Features at this level are also names, but with a certain order implied.
However, the diļ¬erence between the values is again meaningless.
For example, abdominal distension ā {slight, moderate, high}.
13 / 37
30. Images/cinvestav-
Measurement Level
Nominal
Features at this level are represented with labels, states, or names.
The numbers are meaningless in any mathematical sense and no
mathematical calculation can be made.
For example, peristalsis ā {hypermotile, normal, hypomotile, absent}.
Ordinal
Features at this level are also names, but with a certain order implied.
However, the diļ¬erence between the values is again meaningless.
For example, abdominal distension ā {slight, moderate, high}.
13 / 37
31. Images/cinvestav-
Measurement Level
Nominal
Features at this level are represented with labels, states, or names.
The numbers are meaningless in any mathematical sense and no
mathematical calculation can be made.
For example, peristalsis ā {hypermotile, normal, hypomotile, absent}.
Ordinal
Features at this level are also names, but with a certain order implied.
However, the diļ¬erence between the values is again meaningless.
For example, abdominal distension ā {slight, moderate, high}.
13 / 37
32. Images/cinvestav-
Measurement Level
Interval
Features at this level oļ¬er a meaningful interpretation of the
diļ¬erence between two values.
However, there exists no true zero and the ratio between two values is
meaningless.
For example, the concept cold can be seen as an interval.
Ratio
Features at this level possess all the properties of the above levels.
There is an absolute zero.
Thus, we have the meaning of ratio!!!
14 / 37
33. Images/cinvestav-
Measurement Level
Interval
Features at this level oļ¬er a meaningful interpretation of the
diļ¬erence between two values.
However, there exists no true zero and the ratio between two values is
meaningless.
For example, the concept cold can be seen as an interval.
Ratio
Features at this level possess all the properties of the above levels.
There is an absolute zero.
Thus, we have the meaning of ratio!!!
14 / 37
34. Images/cinvestav-
Measurement Level
Interval
Features at this level oļ¬er a meaningful interpretation of the
diļ¬erence between two values.
However, there exists no true zero and the ratio between two values is
meaningless.
For example, the concept cold can be seen as an interval.
Ratio
Features at this level possess all the properties of the above levels.
There is an absolute zero.
Thus, we have the meaning of ratio!!!
14 / 37
35. Images/cinvestav-
Measurement Level
Interval
Features at this level oļ¬er a meaningful interpretation of the
diļ¬erence between two values.
However, there exists no true zero and the ratio between two values is
meaningless.
For example, the concept cold can be seen as an interval.
Ratio
Features at this level possess all the properties of the above levels.
There is an absolute zero.
Thus, we have the meaning of ratio!!!
14 / 37
36. Images/cinvestav-
Measurement Level
Interval
Features at this level oļ¬er a meaningful interpretation of the
diļ¬erence between two values.
However, there exists no true zero and the ratio between two values is
meaningless.
For example, the concept cold can be seen as an interval.
Ratio
Features at this level possess all the properties of the above levels.
There is an absolute zero.
Thus, we have the meaning of ratio!!!
14 / 37
37. Images/cinvestav-
Measurement Level
Interval
Features at this level oļ¬er a meaningful interpretation of the
diļ¬erence between two values.
However, there exists no true zero and the ratio between two values is
meaningless.
For example, the concept cold can be seen as an interval.
Ratio
Features at this level possess all the properties of the above levels.
There is an absolute zero.
Thus, we have the meaning of ratio!!!
14 / 37
38. Images/cinvestav-
Outline
1 Introduction
Introduction
2 Features
Types of Features
3 Similarity and Dissimilarity Measures
Basic Deļ¬nition
4 Proximity Measures between Two Points
Real-Valued Vectors
Dissimilarity Measures
Similarity Measures
Discrete-Valued Vectors
Dissimilarity Measures
Similarity Measures
5 Other Examples
Between Sets
15 / 37
39. Images/cinvestav-
Similarity Measures
Deļ¬nition
A Similarity Measure s on X is a function
s : X Ć X ā R (1)
Such that
1 s (x, y) ā¤ s0.
2 s (x, x) = s0 for all x ā X.
3 s (x, y) = s (y, x) for all x, y ā X.
4 If s (x, y) = s0 āā x = y.
5 s (x, y) s (y, z) ā¤ s (x, z) [s (x, y) + s (y, z)] for all x, y, z ā X.
16 / 37
40. Images/cinvestav-
Similarity Measures
Deļ¬nition
A Similarity Measure s on X is a function
s : X Ć X ā R (1)
Such that
1 s (x, y) ā¤ s0.
2 s (x, x) = s0 for all x ā X.
3 s (x, y) = s (y, x) for all x, y ā X.
4 If s (x, y) = s0 āā x = y.
5 s (x, y) s (y, z) ā¤ s (x, z) [s (x, y) + s (y, z)] for all x, y, z ā X.
16 / 37
41. Images/cinvestav-
Similarity Measures
Deļ¬nition
A Similarity Measure s on X is a function
s : X Ć X ā R (1)
Such that
1 s (x, y) ā¤ s0.
2 s (x, x) = s0 for all x ā X.
3 s (x, y) = s (y, x) for all x, y ā X.
4 If s (x, y) = s0 āā x = y.
5 s (x, y) s (y, z) ā¤ s (x, z) [s (x, y) + s (y, z)] for all x, y, z ā X.
16 / 37
42. Images/cinvestav-
Similarity Measures
Deļ¬nition
A Similarity Measure s on X is a function
s : X Ć X ā R (1)
Such that
1 s (x, y) ā¤ s0.
2 s (x, x) = s0 for all x ā X.
3 s (x, y) = s (y, x) for all x, y ā X.
4 If s (x, y) = s0 āā x = y.
5 s (x, y) s (y, z) ā¤ s (x, z) [s (x, y) + s (y, z)] for all x, y, z ā X.
16 / 37
43. Images/cinvestav-
Similarity Measures
Deļ¬nition
A Similarity Measure s on X is a function
s : X Ć X ā R (1)
Such that
1 s (x, y) ā¤ s0.
2 s (x, x) = s0 for all x ā X.
3 s (x, y) = s (y, x) for all x, y ā X.
4 If s (x, y) = s0 āā x = y.
5 s (x, y) s (y, z) ā¤ s (x, z) [s (x, y) + s (y, z)] for all x, y, z ā X.
16 / 37
44. Images/cinvestav-
Similarity Measures
Deļ¬nition
A Similarity Measure s on X is a function
s : X Ć X ā R (1)
Such that
1 s (x, y) ā¤ s0.
2 s (x, x) = s0 for all x ā X.
3 s (x, y) = s (y, x) for all x, y ā X.
4 If s (x, y) = s0 āā x = y.
5 s (x, y) s (y, z) ā¤ s (x, z) [s (x, y) + s (y, z)] for all x, y, z ā X.
16 / 37
45. Images/cinvestav-
Dissimilarity Measures
Deļ¬ntion
A Dissimilarity Measure d on X is a function
d : X Ć X ā R (2)
Such that
1 d (x, y) ā„ d0 for all x, y ā X.
2 d (x, x) = d0 for all x ā X.
3 d (x, y) = d (y, x) for all x, y ā X.
4 If d (x, y) = d0 āā x = y.
5 d (x, z) ā¤ d (x, y) + d (y, z) for all x, y, z ā X.
17 / 37
46. Images/cinvestav-
Dissimilarity Measures
Deļ¬ntion
A Dissimilarity Measure d on X is a function
d : X Ć X ā R (2)
Such that
1 d (x, y) ā„ d0 for all x, y ā X.
2 d (x, x) = d0 for all x ā X.
3 d (x, y) = d (y, x) for all x, y ā X.
4 If d (x, y) = d0 āā x = y.
5 d (x, z) ā¤ d (x, y) + d (y, z) for all x, y, z ā X.
17 / 37
47. Images/cinvestav-
Dissimilarity Measures
Deļ¬ntion
A Dissimilarity Measure d on X is a function
d : X Ć X ā R (2)
Such that
1 d (x, y) ā„ d0 for all x, y ā X.
2 d (x, x) = d0 for all x ā X.
3 d (x, y) = d (y, x) for all x, y ā X.
4 If d (x, y) = d0 āā x = y.
5 d (x, z) ā¤ d (x, y) + d (y, z) for all x, y, z ā X.
17 / 37
48. Images/cinvestav-
Dissimilarity Measures
Deļ¬ntion
A Dissimilarity Measure d on X is a function
d : X Ć X ā R (2)
Such that
1 d (x, y) ā„ d0 for all x, y ā X.
2 d (x, x) = d0 for all x ā X.
3 d (x, y) = d (y, x) for all x, y ā X.
4 If d (x, y) = d0 āā x = y.
5 d (x, z) ā¤ d (x, y) + d (y, z) for all x, y, z ā X.
17 / 37
49. Images/cinvestav-
Dissimilarity Measures
Deļ¬ntion
A Dissimilarity Measure d on X is a function
d : X Ć X ā R (2)
Such that
1 d (x, y) ā„ d0 for all x, y ā X.
2 d (x, x) = d0 for all x ā X.
3 d (x, y) = d (y, x) for all x, y ā X.
4 If d (x, y) = d0 āā x = y.
5 d (x, z) ā¤ d (x, y) + d (y, z) for all x, y, z ā X.
17 / 37
50. Images/cinvestav-
Dissimilarity Measures
Deļ¬ntion
A Dissimilarity Measure d on X is a function
d : X Ć X ā R (2)
Such that
1 d (x, y) ā„ d0 for all x, y ā X.
2 d (x, x) = d0 for all x ā X.
3 d (x, y) = d (y, x) for all x, y ā X.
4 If d (x, y) = d0 āā x = y.
5 d (x, z) ā¤ d (x, y) + d (y, z) for all x, y, z ā X.
17 / 37
51. Images/cinvestav-
Outline
1 Introduction
Introduction
2 Features
Types of Features
3 Similarity and Dissimilarity Measures
Basic Deļ¬nition
4 Proximity Measures between Two Points
Real-Valued Vectors
Dissimilarity Measures
Similarity Measures
Discrete-Valued Vectors
Dissimilarity Measures
Similarity Measures
5 Other Examples
Between Sets
18 / 37
52. Images/cinvestav-
Dissimilarity Measures
The Hamming distance for {0, 1}
The vectors here are collections of zeros and ones. Thus,
dH (x, y) =
d
i=1
(xi ā yi)2
(3)
Euclidean distance
d (x, y) =
d
i=1
|xi ā yi|
1
2
2
(4)
19 / 37
53. Images/cinvestav-
Dissimilarity Measures
The Hamming distance for {0, 1}
The vectors here are collections of zeros and ones. Thus,
dH (x, y) =
d
i=1
(xi ā yi)2
(3)
Euclidean distance
d (x, y) =
d
i=1
|xi ā yi|
1
2
2
(4)
19 / 37
54. Images/cinvestav-
Dissimilarity Measures
The weighted lp metric DMs
dp (x, y) =
d
i=1
wi |xi ā yi|p
1
p
(5)
Where
Where xi, yi are the ith coordinates of x and y, i = 1, ..., d and wi ā„ 0 is
the ith weight coeļ¬cient.
Important
A well-known representative of the latter category of measures is the
Euclidean distance.
20 / 37
55. Images/cinvestav-
Dissimilarity Measures
The weighted lp metric DMs
dp (x, y) =
d
i=1
wi |xi ā yi|p
1
p
(5)
Where
Where xi, yi are the ith coordinates of x and y, i = 1, ..., d and wi ā„ 0 is
the ith weight coeļ¬cient.
Important
A well-known representative of the latter category of measures is the
Euclidean distance.
20 / 37
56. Images/cinvestav-
Dissimilarity Measures
The weighted lp metric DMs
dp (x, y) =
d
i=1
wi |xi ā yi|p
1
p
(5)
Where
Where xi, yi are the ith coordinates of x and y, i = 1, ..., d and wi ā„ 0 is
the ith weight coeļ¬cient.
Important
A well-known representative of the latter category of measures is the
Euclidean distance.
20 / 37
57. Images/cinvestav-
Dissimilarity Measures
The weighted l2 metric DM can be further generalized as follows
d (x, y) = (x ā y)T
B (x ā y) (6)
Where B is a symmetric, positive deļ¬nite matrix
A special case
This includes the Mahalanobis distance as a special case
21 / 37
58. Images/cinvestav-
Dissimilarity Measures
The weighted l2 metric DM can be further generalized as follows
d (x, y) = (x ā y)T
B (x ā y) (6)
Where B is a symmetric, positive deļ¬nite matrix
A special case
This includes the Mahalanobis distance as a special case
21 / 37
59. Images/cinvestav-
Dissimilarity Measures
Special lp metric DMs that are also encountered in practice are
d1 (x, y) =
d
i=1
wi |xi ā yi| (Weighted Manhattan Norm)
dā (x, y) = max
1ā¤iā¤d
wi |xi ā yi| (Weighted lā Norm)
22 / 37
60. Images/cinvestav-
Dissimilarity Measures
Special lp metric DMs that are also encountered in practice are
d1 (x, y) =
d
i=1
wi |xi ā yi| (Weighted Manhattan Norm)
dā (x, y) = max
1ā¤iā¤d
wi |xi ā yi| (Weighted lā Norm)
22 / 37
61. Images/cinvestav-
Similarity Measures
The inner product
sinner (x, y) = xT
y =
d
i=1
xiyi (7)
Closely related to the inner product is the cosine similarity measure
scosine (x, y) =
xT y
x y
(8)
Where x = d
i=1 x2
i and y = d
i=1 y2
i , the length of the
vectors!!!
23 / 37
62. Images/cinvestav-
Similarity Measures
The inner product
sinner (x, y) = xT
y =
d
i=1
xiyi (7)
Closely related to the inner product is the cosine similarity measure
scosine (x, y) =
xT y
x y
(8)
Where x = d
i=1 x2
i and y = d
i=1 y2
i , the length of the
vectors!!!
23 / 37
63. Images/cinvestav-
Similarity Measures
Pearsonās correlation coeļ¬cient
rPearson (x, y) =
(x ā x)T
(y ā y)
x y
(9)
Where
x =
1
d
d
i=1
xi and y =
1
d
d
i=1
yi (10)
Properties
It is clear that rPearson (x, y) takes values between -1 and +1.
The diļ¬erence between sinner and rPearson does not depend directly
on x and y but on their corresponding diļ¬erence vectors.
24 / 37
64. Images/cinvestav-
Similarity Measures
Pearsonās correlation coeļ¬cient
rPearson (x, y) =
(x ā x)T
(y ā y)
x y
(9)
Where
x =
1
d
d
i=1
xi and y =
1
d
d
i=1
yi (10)
Properties
It is clear that rPearson (x, y) takes values between -1 and +1.
The diļ¬erence between sinner and rPearson does not depend directly
on x and y but on their corresponding diļ¬erence vectors.
24 / 37
65. Images/cinvestav-
Similarity Measures
Pearsonās correlation coeļ¬cient
rPearson (x, y) =
(x ā x)T
(y ā y)
x y
(9)
Where
x =
1
d
d
i=1
xi and y =
1
d
d
i=1
yi (10)
Properties
It is clear that rPearson (x, y) takes values between -1 and +1.
The diļ¬erence between sinner and rPearson does not depend directly
on x and y but on their corresponding diļ¬erence vectors.
24 / 37
66. Images/cinvestav-
Similarity Measures
Pearsonās correlation coeļ¬cient
rPearson (x, y) =
(x ā x)T
(y ā y)
x y
(9)
Where
x =
1
d
d
i=1
xi and y =
1
d
d
i=1
yi (10)
Properties
It is clear that rPearson (x, y) takes values between -1 and +1.
The diļ¬erence between sinner and rPearson does not depend directly
on x and y but on their corresponding diļ¬erence vectors.
24 / 37
67. Images/cinvestav-
Similarity Measures
The Tanimoto measure or distance
sT (x, y) =
xT y
x 2
+ y 2
ā xT y
(11)
Something Notable
sT (x, y) =
1
1 + (xāy)T
(xāy)
xT y
(12)
Meaning
The Tanimoto measure is inversely proportional to the squared Euclidean
distance.
25 / 37
68. Images/cinvestav-
Similarity Measures
The Tanimoto measure or distance
sT (x, y) =
xT y
x 2
+ y 2
ā xT y
(11)
Something Notable
sT (x, y) =
1
1 + (xāy)T
(xāy)
xT y
(12)
Meaning
The Tanimoto measure is inversely proportional to the squared Euclidean
distance.
25 / 37
69. Images/cinvestav-
Similarity Measures
The Tanimoto measure or distance
sT (x, y) =
xT y
x 2
+ y 2
ā xT y
(11)
Something Notable
sT (x, y) =
1
1 + (xāy)T
(xāy)
xT y
(12)
Meaning
The Tanimoto measure is inversely proportional to the squared Euclidean
distance.
25 / 37
70. Images/cinvestav-
Outline
1 Introduction
Introduction
2 Features
Types of Features
3 Similarity and Dissimilarity Measures
Basic Deļ¬nition
4 Proximity Measures between Two Points
Real-Valued Vectors
Dissimilarity Measures
Similarity Measures
Discrete-Valued Vectors
Dissimilarity Measures
Similarity Measures
5 Other Examples
Between Sets
26 / 37
71. Images/cinvestav-
We consider now vectors whose coordinates belong to a
ļ¬nite set
Given F = {0, 1, 2, ..., k ā 1} where k is a positive integer
There are exactly kd vectors x ā Fd
Now consider x, y ā Fd
With A (x, y) = [aij], i, j = 0, 1, ..., k ā 1 be a k Ć k matrix
Where
The element aij is the number of places where the ļ¬rst vector has the i
symbol and the corresponding element of the second vector has the j
symbol.
27 / 37
72. Images/cinvestav-
We consider now vectors whose coordinates belong to a
ļ¬nite set
Given F = {0, 1, 2, ..., k ā 1} where k is a positive integer
There are exactly kd vectors x ā Fd
Now consider x, y ā Fd
With A (x, y) = [aij], i, j = 0, 1, ..., k ā 1 be a k Ć k matrix
Where
The element aij is the number of places where the ļ¬rst vector has the i
symbol and the corresponding element of the second vector has the j
symbol.
27 / 37
73. Images/cinvestav-
We consider now vectors whose coordinates belong to a
ļ¬nite set
Given F = {0, 1, 2, ..., k ā 1} where k is a positive integer
There are exactly kd vectors x ā Fd
Now consider x, y ā Fd
With A (x, y) = [aij], i, j = 0, 1, ..., k ā 1 be a k Ć k matrix
Where
The element aij is the number of places where the ļ¬rst vector has the i
symbol and the corresponding element of the second vector has the j
symbol.
27 / 37
74. Images/cinvestav-
Using Indicator functions
We can think of each element aij
aij =
d
k=
I [(i, j) == (xk, yk)] (13)
Thanks to Ricardo Llamas and Gibran Felix!!! Class Summer 2015.
28 / 37
75. Images/cinvestav-
Contingency Matrix
Thus
This matrix is called a contingency table.
For example if d = 6 and k = 3
With x = [0, 1, 2, 1, 2, 1]T
and y = [1, 0, 2, 1, 0, 1]T
Then
A (x, y) =
ļ£®
ļ£Æ
ļ£°
0 1 0
1 2 0
1 0 1
ļ£¹
ļ£ŗ
ļ£» (14)
29 / 37
76. Images/cinvestav-
Contingency Matrix
Thus
This matrix is called a contingency table.
For example if d = 6 and k = 3
With x = [0, 1, 2, 1, 2, 1]T
and y = [1, 0, 2, 1, 0, 1]T
Then
A (x, y) =
ļ£®
ļ£Æ
ļ£°
0 1 0
1 2 0
1 0 1
ļ£¹
ļ£ŗ
ļ£» (14)
29 / 37
77. Images/cinvestav-
Contingency Matrix
Thus
This matrix is called a contingency table.
For example if d = 6 and k = 3
With x = [0, 1, 2, 1, 2, 1]T
and y = [1, 0, 2, 1, 0, 1]T
Then
A (x, y) =
ļ£®
ļ£Æ
ļ£°
0 1 0
1 2 0
1 0 1
ļ£¹
ļ£ŗ
ļ£» (14)
29 / 37
78. Images/cinvestav-
Thus, we have that
Easy to verify that
kā1
i=0
kā1
j=0
aij = d (15)
Something Notable
Most of the proximity measures between two discrete-valued vectors may
be expressed as combinations of elements of matrix A (x, y).
30 / 37
79. Images/cinvestav-
Thus, we have that
Easy to verify that
kā1
i=0
kā1
j=0
aij = d (15)
Something Notable
Most of the proximity measures between two discrete-valued vectors may
be expressed as combinations of elements of matrix A (x, y).
30 / 37
80. Images/cinvestav-
Dissimilarity Measures
The Hamming distance
It is deļ¬ned as the number of places where two vectors diļ¬er:
d (x, y) =
kā1
i=0
kā1
j=0,j=i
aij (16)
The summation of all the oļ¬-diagonal elements of A, which indicate the
positions where x and y diļ¬er.
Special case k = 2, thus vectors are binary valued
The vectors x ā Fd are binary valued and the Hamming distance becomes
dH (x, y) =
d
i=1
(xi + yi ā 2xiyi)
=
d
i=1
(xi ā yi)2
31 / 37
81. Images/cinvestav-
Dissimilarity Measures
The Hamming distance
It is deļ¬ned as the number of places where two vectors diļ¬er:
d (x, y) =
kā1
i=0
kā1
j=0,j=i
aij (16)
The summation of all the oļ¬-diagonal elements of A, which indicate the
positions where x and y diļ¬er.
Special case k = 2, thus vectors are binary valued
The vectors x ā Fd are binary valued and the Hamming distance becomes
dH (x, y) =
d
i=1
(xi + yi ā 2xiyi)
=
d
i=1
(xi ā yi)2
31 / 37
86. Images/cinvestav-
Thus
We now deļ¬ne
nx = kā1
i=1
kā1
j=0 aij and ny = kā1
i=0
kā1
j=1 aij
In other words
nx and ny denotes the number of the nonzero coordinate of x and y.
Thus, we have
sT (x, y) ==
kā1
i=1
kā1
j=i aii
nX + nY ā kā1
i=1
kā1
j=i aij
(19)
34 / 37
87. Images/cinvestav-
Thus
We now deļ¬ne
nx = kā1
i=1
kā1
j=0 aij and ny = kā1
i=0
kā1
j=1 aij
In other words
nx and ny denotes the number of the nonzero coordinate of x and y.
Thus, we have
sT (x, y) ==
kā1
i=1
kā1
j=i aii
nX + nY ā kā1
i=1
kā1
j=i aij
(19)
34 / 37
88. Images/cinvestav-
Thus
We now deļ¬ne
nx = kā1
i=1
kā1
j=0 aij and ny = kā1
i=0
kā1
j=1 aij
In other words
nx and ny denotes the number of the nonzero coordinate of x and y.
Thus, we have
sT (x, y) ==
kā1
i=1
kā1
j=i aii
nX + nY ā kā1
i=1
kā1
j=i aij
(19)
34 / 37
89. Images/cinvestav-
Outline
1 Introduction
Introduction
2 Features
Types of Features
3 Similarity and Dissimilarity Measures
Basic Deļ¬nition
4 Proximity Measures between Two Points
Real-Valued Vectors
Dissimilarity Measures
Similarity Measures
Discrete-Valued Vectors
Dissimilarity Measures
Similarity Measures
5 Other Examples
Between Sets
35 / 37
92. Images/cinvestav-
Although there are more stuļ¬ to look for
Please Refer to
1 Theodoridisā Book Chapter 11.
2 Rui Xu and Don Wunsch. 2009. Clustering. Wiley-IEEE Press.
37 / 37