Families of Triangular Norm Based Kernel Function and Its Application to Kernel k-means

/ 22SCIS&ISIS20162016.08.27
Families of Triangular Norm Based Kernel Function
and Its Application to Kernel k-means
Kazushi Okamoto
The University of Electro-Communications
1

/ 22SCIS&ISIS20162016.08.27
Introduction
• A kernel method is a fundamental and important pattern analysis
approach based on a kernel function
• It is used in machine learning tasks such as classiﬁcation,
clustering, and dimension reduction
• A kernel function corresponds to a similarity measure between
two data
• mapping each data to a high-dimensional feature space
• inner product on that space
2

SCIS&ISIS20162016.08.27 / 22
Existing Kernel Functions
3
linear kernel Klin(x, y) =
dX
i=1
xiyi
Kpol(x, y) =
dX
i=1
xiyi + l
!p
Krbf (x, y) = exp −
Pd
i=1(xi − yi)2
σ2
!
Kint(x, y) =
dX
i=1
min{xi, yi}
polynomial kernel
RBF kernel
intersection kernel
minimum and product operations are one of triangular norms
(generalization of intersection operations)

SCIS&ISIS20162016.08.27 / 22
A function is a positive semi-deﬁnite kernel if and only ifK
Positive Semi-Deﬁnite Kernel
4
K : ⌦ ⇥ ⌦ −! R 8x, y 2 ⌦ 8x1, x2, · · · , xm 2 ⌦
K(x, y) = K(y, x)
mX
i=1
mX
j=1
cicjK(xi, xj) ≥ 0 8ci, cj 2 R
1)
2)
Kernel calculation means inner product on the feature space φ
K(x, y) = φ(x) · φ(y)
8m 2 N+
A set is assumed real-valued vector, graph, and string⌦
(quadratic form)

SCIS&ISIS20162016.08.27 / 22
Additive Kernel
5
K : Rd
⇥ Rd
−! R 8x, y 2 Rd
K(x, y) =
dX
i=1
Ki(xi, yi)
Ki : R ⇥ R −! R
8KiIf is positive semi-deﬁnite, then the additive kernel
is also positive semi-deﬁnite kernel, since
K
mX
j=1
mX
k=1
cjckK(x, y) =
mX
j=1
mX
k=1
cjck
dX
i=1
Ki(xi, yi)
=
dX
i=1
mX
j=1
mX
k=1
cjckKi(xi, yi) ≥ 0
5

SCIS&ISIS20162016.08.27 / 22
Triangular norm (t-norm)
6
A function is called t-norm if and only ifT
T(x, 1) = x
T(x, y) = T(y, x)
T(x, T(y, z)) = T(T(x, y), z)
T(x, y)  T(x, z) y  zif
1)
2)
3)
4)
T : [0, 1] ⇥ [0, 1] −! [0, 1] 8x, y, z 2 [0, 1]
According to fuzzy logic, t-norms represent intersection operations

SCIS&ISIS20162016.08.27 / 22
Example of t-norms
7
Hamacher t-norm
(p = 0.4)
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
x
y
Dubois t-norm
(p = 0.4)
0 0.2 0.4 0.6 0.8 1 0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
x
y
Hamacher t-norm
Dubois t-norm Tdb(x, y) =
xy
max {x, y, p}
Th(x, y) =
xy
p + (1 − p)(x + y − xy)
p 2 [0, 1]
p 2 [0, 1]

SCIS&ISIS20162016.08.27 / 22
all principal minors of is
Is t-norm Positive Semi-Definite on [0, 1] ?
8
A =
0
B
B
B
@
T(x1, x1) T(x1, x2) · · · T(x1, xm)
T(x2, x1) T(x2, x2) · · · T(x2, xm)
...
...
...
...
T(xm, x1) T(xm, x2) · · · T(xm, xm)
1
C
C
C
A
Condition of positive semi-definite
A ≥ 0
8ci, cj 2 R 8m 2 N+
()A is positive semi-definite
mX
i=1
mX
j=1
cicjT(xi, xj)

SCIS&ISIS20162016.08.27 / 22
9
m = 2
✓
T(x1, x1) T(x1, x2)
T(x2, x1) T(x2, x2)
◆
|T(x1, x1)| ≥ 0
|T(x2, x2)| ≥ 0
T(x1, x1) T(x1, x2)
T(x2, x1) T(x2, x2)
m = 1
1X
i=1
1X
j=1
cicjT(xi, xi) = c2
i T(xi, xi) ≥ 0
all principal minors of are ≥ 0

SCIS&ISIS20162016.08.27 / 22
10
T(x1, x1) T(x1, x2)
T(x2, x1) T(x2, x2)
= T(x1, x1)T(x2, x2) − T2
(x1, x2)
T(x, y) ≥ Ta(x, y) = xy8x, y z · T(x, y)  T(x, zy)=)
T(0, x2) = 0 < x1 < x2 = T(1, x2) x1 = T(w, x2)
T(x1, x2)
T(x2, x2)
T(w, T(x2, x2))  T(w, T(x1, x2))
T(x2, x2)
T(x1, x2)
≥
T(w, T(x2, x2))
T(w, T(x1, x2))
=
T(T(w, x2), x2)
T(T(w, x2), x1)
=
T(x1, x2)
T(x1, x1)

SCIS&ISIS20162016.08.27 / 22
Kernel k-means
11
A partition partition algorithm minimizing the objective function
J = min
X
µ2M
X
x2Ci
||φ(x) − µ||2
,
x1, x2, · · · , xn 2 Rd
||φ(x) − µi||2
= ||φ(x) −
1
|Ci|
X
x02Ci
φ(x0
)||2
= K(x, x) −
2
|Ci|
X
x02Ci
K(x, x0
) +
1
|Ci|2
X
x02Ci
X
x002Ci
K(x0
, x00
)
Kernel trick

/ 22SCIS&ISIS20162016.08.27
Conditions of Clustering Experiment
• each clustering process was terminated when the number of
iterations reached 1,000, or the difference between the latest
and current objective function values was less than 10-4
• one partition that minimized the objective function was
determined within 100 attempts using different initial partitions
• number of clusters was determined depending on the data set
12

SCIS&ISIS20162016.08.27 / 22
Applied Kernel Functions
13
linear kernel Klin(x, y) =
dX
i=1
xiyi
Krbf (x, y) = exp −
Pd
i=1(xi − yi)2
σ2
!
RBF kernel
Kt(x, y) =
dX
i=1
Ti(xi, yi)t-norm kernel
Applied non-parameterized t-norms
Tmp(x, y) =
2
⇡
cot−1
✓
cot
1
2
⇡x + cot
1
2
⇡y
◆
Mizumoto product
Tl(x, y) = min{x, y}logical product

SCIS&ISIS20162016.08.27 / 22
Applied Parameterized t-norms
14
Tdm(x, y) =
1
1 + p
r
�1−x
x
�p
+
⇣
1−y
y
⌘p
Tdb(x, y) =
xy
max {x, y, p}
Dubois t-norm
Dombi t-norm
Tf (x, y) = logp
✓
1 +
(px
− 1)(py
− 1)
p − 1
◆
Frank t-norm
Th(x, y) =
xy
p + (1 − p)(x + y − xy)
Hamacher t-norm
Ts2(x, y) =
1
p
q
1
xp + 1
yp − 1
Ts3(x, y) = 1 − p
p
(1 − x)p + (1 − y)p − (1 − x)p(1 − y)p
Schweizer t-norm 2
Schweizer t-norm 3

SCIS&ISIS20162016.08.27 / 22
Evaluation Measure: Adjusted Rand Index (ARI)
15
ARI =
MX
i=1
NX
j=1
nij C2 −
ab
nC2
1
2
(a + b) −
ab
nC2
a =
MX
i=1
ni· C2
b =
NX
j=1
n·j C2
U = {u1, u2, · · · , uM } V = {v1, v2, · · · , vN }
nij = |ui vj|
ni· =
NX
j=1
nijn·j =
MX
i=1
nij n =
MX
i=1
NX
j=1
nij

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Four Data Sets Used to Numerical Experiment
Data Set BData Set A
Data Set C Data Set D

SCIS&ISIS20162016.08.27 / 22
Best ARI Values for Each Kernel and Data Set
17
Data Set A Data Set B Data Set C Data Set D
linear kernel 0.4535 0.5767 0.4650 -0.0054
RBF kernel 0.4880 0.5767 0.7611 0.1375
t-norm kernel (logical product) 0.0240 0.5146 0.2990 -0.0037
t-norm kernel (Mizumoto product) 0.4997 0.5528 0.4650 -0.0050
t-norm kernel (Dombi t-norm) 0.5237 0.5612 0.4717 0.0462
t-norm kernel (Dubois t-norm) 0.5117 0.5853 0.4757 0.0315
t-norm kernel (Frank t-norm) 0.4880 0.5767 0.4688 -0.0049
t-norm kernel (Hamacher t-norm) 0.4880 0.5767 0.4650 -0.0046
t-norm kernel (Schweizer t-norm 2) 0.5237 0.5767 0.4717 0.0477
t-norm kernel (Schweizer t-norm 3) 0.5237 0.5767 0.4717 0.0445

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Data Set A
linear kernel
RBF kernel (σ=8.52)
correct cluster
t-norm kernel (Dombi, p=1.98)

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Data Set B
linear kernel
correct cluster
t-norm kernel (Dubois, p=0.76)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Data Set C
linear kernel
correct cluster
t-norm kernel (Dubois, p=0.38)

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Data Set D
linear kernel
correct cluster
t-norm kernel (Dombi, p=8.95)

/ 22SCIS&ISIS20162016.08.27
Conclusion
• The concept of the t-norm based additive kernel is proposed
• Numerical experiment
• ARI values obtained by the proposal were almost the same or
higher than those by the linear kernel with all of the data sets
• the proposal slightly improved the ARI values for some data
sets compared with the RBF kernel
• the proposed method maps data to a higher dimensional
feature space than the linear kernel but the dimension is lower
than that of the RBF kernel.
• The t-norm kernel with the Dubois t-norm had a low calculation
cost compared with the RBF kernel
22

Families of Triangular Norm Based Kernel Function and Its Application to Kernel k-means

More Related Content

What's hot

Viewers also liked

Similar to Families of Triangular Norm Based Kernel Function and Its Application to Kernel k-means

More from Okamoto Laboratory, The University of Electro-Communications

Recently uploaded

Families of Triangular Norm Based Kernel Function and Its Application to Kernel k-means