prototypes-AMALEA.pdf

Michael Biehl
Bernoulli Institute for Mathematics,
Computer Science and Artificial Intelligence
University of Groningen
The Netherlands
www.cs.rug.nl/~biehl
Prototype-based machine learning:
bio-medical applications

AMALEA 2022 2
review: WIRES Cognitive Science (2016)

AMALEA 2022 4
overview
1. Introduction / Motivation
prototypes and exemplars, neural activation / learning

AMALEA 2022 4
overview
2. Unsupervised Learning
Competitive Learning
Kohonen’s Self-Organizing Map (SOM)

AMALEA 2022 4
overview
3. Supervised Learning
Learning Vector Quantization (LVQ)
Adaptive distances and Relevance Learning

AMALEA 2022 4
overview
3. Supervised Learning
Learning Vector Quantization (LVQ)
Adaptive distances and Relevance Learning
Examples and illustrations : Bio-medical applications
- clustering of proteomics data
- biomarkers for rheumatoid arthritis
(- FDG-Pet brain scans)

AMALEA 2022 5
1. Introduction
prototypes, exemplars:
representation of information in terms of
typical representatives (e.g. of a class of objects),
much debated concept in cognitive psychology

AMALEA 2022 5
1. Introduction
neural activation:
external stimulus to a network of neurons
response acc. to weights (expected inputs)

AMALEA 2022 5
1. Introduction
neural activation:
best matching unit (and neighbors)
weights represent different expected stimuli (prototypes)

AMALEA 2022 5
1. Introduction
neural activation:
best matching unit (and neighbors)
weights represent different expected stimuli (prototypes)
learning: change of weights result in even stronger
response to similar stimuli in the future

AMALEA 2022 6
even independent from the above:
attractive framework for machine learning based data analysis
- trained system is parameterized in the feature space
- facilitates discussions with domain experts
- transparent (white box) and provides insights into the
applied criteria (classification, regression, clustering etc.)
- easy to implement, efficient computation
- versatile, successfully applied in many different application areas

AMALEA 2022 7
Potential aims:
dimension reduction: compression, visualization, ...
exploration of data structure: clustering, density estimation, ...
pre-processing: supvervised learning, classification, regression,...

AMALEA 2022 7
Potential aims:
Vector Quantization: identify (few) typical representatives
from a set of feature vectors
w1
, w2
, . . . , wK
wk
2 I
RN
x1
, x2
, . . . , xP
xµ
2 I
RN

AMALEA 2022 7
Potential aims:
Vector Quantization: identify (few) typical representatives
from a set of feature vectors
w1
, w2
, . . . , wK
wk
2 I
RN
x1
, x2
, . . . , xP
xµ
2 I
RN
assign xµ to winning prototype w⇤
= argminj d(wj
, xµ
)
d(w, x) =
N
X
n=1
(wn xn)
2
for instance w.r.t. :
squared Euclidean distance

AMALEA 2022 8
, random sequence of single data:
… the winner takes it all:
initially: randomized wk
competitive learning
competition for updates
learning rate / step size η <1
⌘ (xµ
w⇤
)
w⇤
! w⇤
+ ⌘ (xµ
w⇤
)

AMALEA 2022 8
, random sequence of single data:
… the winner takes it all:
initially: randomized wk
competition for updates
learning rate / step size η <1
⌘ (xµ
w⇤
)
w⇤
! w⇤
+ ⌘ (xµ
w⇤
)
competitive VQ = stochastic gradient descent w.r.t. Quantization Error
- assign each data to closest prototype
- measure the corresponding distance (e.g. squared Euclidean)
- sum over all assigned data points
measures the quality of the representation
defines a (one possible) criterion to evaluate / compare
the quality of different prototype configurations
{
QE

AMALEA 2022 9
data
initial
prototypes

AMALEA 2022 9
data
initial
prototypes
dead
units
WTA training

AMALEA 2022 9
data
initial
prototypes
dead
units
WTA training
general problem: local minima of the quantization error,
initialization-dependent outcome of training
improvement: rank-based updates (winner, second, third,… )

AMALEA 2022 9
data
initial
prototypes
dead
units
WTA training
general problem: local minima of the quantization error,
initialization-dependent outcome of training
improvement: rank-based updates (winner, second, third,… )
introduce rank-based neighborhood cooperativeness
[Martinetz, Berkovich, Schulten, IEEE Trans. Neural Netw. 1993]
Neural Gas: many prototypes to represent the density of data

AMALEA 2022
Self-Organizing Map
T. Kohonen. Self-Organizing Maps. Springer (1995)
neighborhood cooperativeness on a predefined low-dim. lattice

AMALEA 2022
Self-Organizing Map
lattice A of neurons
i.e. prototypes
wr 2 I
RN
at r 2 I
Rd
A

AMALEA 2022
Self-Organizing Map
i.e. prototypes
ws
wr 2 I
RN
at r 2 I
Rd
upon presentation of xµ :
- determine the winner (best matching unit)
in feature space: (at position s in A)
A

AMALEA 2022
Self-Organizing Map
i.e. prototypes
- update winner and lattice neighborhood:
where
range ρ w.r.t. distances in lattice A
ws
wr 2 I
RN
at r 2 I
Rd
h⇢(r, s) = exp
✓
|| r s ||2
A
2⇢2
◆
upon presentation of xµ :
- determine the winner (best matching unit)
in feature space: (at position s in A)
<latexit sha1_base64="P/AKHBol8ZLEa4RgJDSz5KstNjs=">AAACY3icbVHLSgMxFM2MWrU+Oj52KgRFaLGWGVF0KbhxWcFaoVOHTJrphGYeJBm1DrP0i9z6Fe7EjRv/w0wfUqsXkns491zuzYkbMyqkab5r+szsXGF+YbG4tLyyWjLW1m9ElHBMGjhiEb91kSCMhqQhqWTkNuYEBS4jTbd3kdeb94QLGoXXsh+TdoC6IfUoRlJRjvFkB0j6rpc+ZA6HtozgiIAjpgoP8ssmEuXZd2zuR+VxF8+qYyiyihLkUpd2fwSP2Z0dJPAQTs7JFRXoGHtmzRwE/AusEdg73/aeP+5fXuuO8WZ3IpwEJJSYISFalhnLdoq4pJiRrGgngsQI91CXtBQMUUBEOx14lMF9xXSgF3F1QgkH7GRHigIh+oGrlPmmYrqWk//VWon0ztopDeNEkhAPB3kJg8rK3HDYoZxgyfoKIMyp2hViH3GEpfqWojLBmn7yX3BzVLNOauaVcuMYDGMBbIFdUAYWOAXn4BLUQQNg8KkVtJJmaF/6kr6ubw6lujbq2QC/Qt/5BtYMuc8=</latexit>
wr ! wr + ⌘ h⇢(r, s) xµ
wr
A

AMALEA 2022 11
- lattice deforms in , reflecting the density of observations
© Wikipedia
Self-Organizing Map
<latexit sha1_base64="S6bznRCRGpJCx80D30CfTaQNebg=">AAAB83icbVDLSsNAFL2pr1pfUZduBovgqiSi6M6CG1dSxT6giWUynbRDJ5MwMxFK6G8I6kIRt36Av+HOv3HSdqGtBwYO59zLPXOChDOlHefbKiwsLi2vFFdLa+sbm1v29k5DxakktE5iHstWgBXlTNC6ZprTViIpjgJOm8HgIveb91QqFotbPUyoH+GeYCEjWBvJ8yKs+0GQ3Yzurjp22ak4Y6B54k5J+fzzMcdTrWN/ed2YpBEVmnCsVNt1Eu1nWGpGOB2VvFTRBJMB7tG2oQJHVPnZOPMIHRili8JYmic0Gqu/NzIcKTWMAjOZZ1SzXi7+57VTHZ75GRNJqqkgk0NhypGOUV4A6jJJieZDQzCRzGRFpI8lJtrUVDIluLNfnieNo4p7UnGunXL1GCYowh7swyG4cApVuIQa1IFAAg/wAq9Waj1bb9b7ZLRgTXd24Q+sjx8yWJYz</latexit>
RN

AMALEA 2022 11
© Wikipedia
SOM provides topology preserving low-dim representation
for inspection and visualization of structured datasets
Self-Organizing Map
RN

AMALEA 2022 11
© Wikipedia
SOM provides topology preserving low-dim representation
for inspection and visualization of structured datasets
Self-Organizing Map
e.g. SOM toolbox (Matlab) Helsinki University of Technology
http://www.cis.hut.fi/somtoolbox/
- select lattice type, size and shape (PCA-based defaults)
- select training prescription, training time etc.
- compute quantization error etc.
- visualize the SOM grid, distances between prototypes ...
RN

AMALEA 2022 12
- many extensions of the basic concept, e.g.
cost function based SOM [Heskes]
Generative Topographic Map (GTM), probabilistic
formulation of the mapping to low-dim. lattice
[Bishop, Svensen, Williams, 1998]
specific modifications of SOM or Neural Gas for
- time series / functional data
- “non-vectorial” relational data
- graphs and trees
- supervised learning
Remarks

AMALEA 2022 13
example application

AMALEA 2022
the “central dogma” of molecular biology

AMALEA 2022
transcription:
DNA è (m)RNA

AMALEA 2022
transcription:
DNA è (m)RNA
translation
mRNA è
proteins

AMALEA 2022
transcription:
DNA è (m)RNA
translation
mRNA è
proteins
protein,è
function

AMALEA 2022
transcription:
DNA è (m)RNA
translation
mRNA è
proteins
Ribosome
protein,è
function

AMALEA 2022
• is an ancient molecular machine, ‘3D-printer’ for proteins
the ribosome…

AMALEA 2022
• ~ 107 cytoplasmic ribosomes per cell
the ribosome…

AMALEA 2022
• is believed to have universal function and the same
the ribosome…

AMALEA 2022
composition in different tissues and across species
the ribosome…

AMALEA 2022
• consists of RNA and
the ribosome…

AMALEA 2022
ribosomal proteins (RP)
the ribosome…

AMALEA 2022
the ribosome…
also coded by DNA which
is transcribed to mRNA

AMALEA 2022
the ribosome…
also coded by DNA which
is transcribed to mRNA
here: analysis of
RP mRNA expression

AMALEA 2022
Klijn et al. Nature Biotechnol. 33(3): 306-312 (2015)
675 cell lines
public domain data sets
GTeX (v6p) www.gtexportal.org
8,555 normal samples from 53 different tissues (with >50 samples)
TCGA (NCI-GDC, v7) www.cancer.gov
10363 tumor samples, 730 tumor-adjacent normals

AMALEA 2022
675 cell lines
mRNA
expression

AMALEA 2022
675 cell lines
normalization:
constant sum of reads
for each of the 78 RP
depending on method:
log-transform, z-score
mRNA
expression

AMALEA 2022
normal samples (GTeX)
whole blood - brain tissues - rest
different tissues have different RP mRNA signatures: PCA

AMALEA 2022 18
9
13
14
13
14
11
12
11
11
11
10
6
6
6
6
6
6
6
30
26
32
33
33
33
38
38
38
8
8
16
13
17
12
12
9
39
32
6
6
6
4
4
4
48
32
32
32
33
38
38
38
8
16
15
17
17
17
11
50
1
36
51
6
6
4
4
4
27
32
32
32
33
38
38
38
17
16
16
18
10
17
42
50
50
2
28
26
6
4
4
4
5
6
32
32
33
38
38
38
38
8
20
18
10
18
17
42
42
50
50
26
30
30
26
5
5
5
1
36
36
32
32
33
38
38
38
18
20
20
16
10
15
42
42
50
50
27
26
30
30
30
28
6
6
39
39
36
36
36
1
19
19
20
50
50
50
43
26
30
30
28
51
52
39
39
39
36
36
2
1
1
1
1
19
19
34
50
50
50
50
35
26
30
26
51
51
51
39
39
39
2
21
1
1
1
1
2
49
49
49
35
36
36
50
50
50
50
50
30
30
43
51
51
21
21
39
39
21
2
1
1
1
1
2
49
49
35
36
36
36
37
37
37
43
43
43
43
40
40
52
21
39
39
21
1
1
1
1
2
2
49
35
35
36
36
36
37
48
48
37
36
43
2
24
40
21
21
21
39
5
5
1
1
1
1
1
3
3
35
36
47
47
46
48
48
48
27
36
39
21
2
21
21
44
45
45
45
2
5
53
53
36
53
3
35
47
46
46
27
48
37
36
30
39
21
52
44
44
44
45
44
44
44
53
53
53
53
53
3
3
35
36
27
27
27
27
36
36
36
36
1
29
52
44
45
45
45
44
44
45
53
53
53
53
3
35
43
27
27
27
52
36
33
23
29
7
23
52
45
45
45
45
45
53
53
53
53
41
48
43
43
52
29
29
29
29
23
23
23
22
22
23
23
45
45
45
45
45
53
53
53
41
41
43
29
29
29
29
29
23
23
23
23
22
22
23
23
23
45
45
45
45
45
53
53
53
individual tissue labels
(majority assignments)
normal samples: SOM, labelled post-hoc

AMALEA 2022 18
9
13
14
13
14
11
12
11
11
11
10
6
6
6
6
6
6
6
30
26
32
33
33
33
38
38
38
8
8
16
13
17
12
12
9
39
32
6
6
6
4
4
4
48
32
32
32
33
38
38
38
8
16
15
17
17
17
11
50
1
36
51
6
6
4
4
4
27
32
32
32
33
38
38
38
17
16
16
18
10
17
42
50
50
2
28
26
6
4
4
4
5
6
32
32
33
38
38
38
38
8
20
18
10
18
17
42
42
50
50
26
30
30
26
5
5
5
1
36
36
32
32
33
38
38
38
18
20
20
16
10
15
42
42
50
50
27
26
30
30
30
28
6
6
39
39
36
36
36
1
19
19
20
50
50
50
43
26
30
30
28
51
52
39
39
39
36
36
2
1
1
1
1
19
19
34
50
50
50
50
35
26
30
26
51
51
51
39
39
39
2
21
1
1
1
1
2
49
49
49
35
36
36
50
50
50
50
50
30
30
43
51
51
21
21
39
39
21
2
1
1
1
1
2
49
49
35
36
36
36
37
37
37
43
43
43
43
40
40
52
21
39
39
21
1
1
1
1
2
2
49
35
35
36
36
36
37
48
48
37
36
43
2
24
40
21
21
21
39
5
5
1
1
1
1
1
3
3
35
36
47
47
46
48
48
48
27
36
39
21
2
21
21
44
45
45
45
2
5
53
53
36
53
3
35
47
46
46
27
48
37
36
30
39
21
52
44
44
44
45
44
44
44
53
53
53
53
53
3
3
35
36
27
27
27
27
36
36
36
36
1
29
52
44
45
45
45
44
44
45
53
53
53
53
3
35
43
27
27
27
52
36
33
23
29
7
23
52
45
45
45
45
45
53
53
53
53
41
48
43
43
52
29
29
29
29
23
23
23
22
22
23
23
45
45
45
45
45
53
53
53
41
41
43
29
29
29
29
29
23
23
23
23
22
22
23
23
23
45
45
45
45
45
53
53
53
4
4
4
4
4
3
3
3
3
3
4
15
15
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
3
3
4
15
9
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
4
3
15
15
15
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
4
5
15
15
15
15
15
15
15
15
15
15
15
9
9
9
10
10
10
10
4
4
4
4
4
4
5
5
15
15
15
15
15
15
15
15
15
15
15
15
9
9
9
10
10
10
4
4
4
4
4
4
5
5
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
4
4
4
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
4
4
15
15
15
15
15
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
2
2
2
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
2
2
13
15
15
15
15
15
15
15
15
15
15
14
14
15
15
15
15
15
15
15
15
15
15
15
2
13
13
15
15
15
15
15
15
15
15
15
15
15
14
15
15
15
15
15
15
15
15
15
15
15
12
12
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
6
15
6
15
15
15
1
1
15
1
12
13
15
15
15
15
15
15
15
15
15
15
6
6
6
6
6
6
6
6
1
1
1
1
1
12
12
13
15
15
15
15
15
15
15
15
15
15
15
15
6
6
6
6
6
6
6
1
1
1
1
12
13
15
15
15
15
15
15
9
8
15
15
8
15
6
6
6
6
6
1
1
1
1
11
15
15
15
15
15
15
15
15
8
8
8
7
7
8
8
6
6
6
6
6
1
1
1
11
11
15
15
15
15
15
15
8
8
8
8
7
7
8
8
8
6
6
6
6
6
1
1
1
suggested groups of tissues

AMALEA 2022 18
rest
ovary
liver
adrenal gland
pancreas
muscle
heart
cells (fibroblasts)
cells (ebv)
skin
pituitary
brain (rest)
cerebellum
testis
blood
9
13
14
13
14
11
12
11
11
11
10
6
6
6
6
6
6
6
30
26
32
33
33
33
38
38
38
8
8
16
13
17
12
12
9
39
32
6
6
6
4
4
4
48
32
32
32
33
38
38
38
8
16
15
17
17
17
11
50
1
36
51
6
6
4
4
4
27
32
32
32
33
38
38
38
17
16
16
18
10
17
42
50
50
2
28
26
6
4
4
4
5
6
32
32
33
38
38
38
38
8
20
18
10
18
17
42
42
50
50
26
30
30
26
5
5
5
1
36
36
32
32
33
38
38
38
18
20
20
16
10
15
42
42
50
50
27
26
30
30
30
28
6
6
39
39
36
36
36
1
19
19
20
50
50
50
43
26
30
30
28
51
52
39
39
39
36
36
2
1
1
1
1
19
19
34
50
50
50
50
35
26
30
26
51
51
51
39
39
39
2
21
1
1
1
1
2
49
49
49
35
36
36
50
50
50
50
50
30
30
43
51
51
21
21
39
39
21
2
1
1
1
1
2
49
49
35
36
36
36
37
37
37
43
43
43
43
40
40
52
21
39
39
21
1
1
1
1
2
2
49
35
35
36
36
36
37
48
48
37
36
43
2
24
40
21
21
21
39
5
5
1
1
1
1
1
3
3
35
36
47
47
46
48
48
48
27
36
39
21
2
21
21
44
45
45
45
2
5
53
53
36
53
3
35
47
46
46
27
48
37
36
30
39
21
52
44
44
44
45
44
44
44
53
53
53
53
53
3
3
35
36
27
27
27
27
36
36
36
36
1
29
52
44
45
45
45
44
44
45
53
53
53
53
3
35
43
27
27
27
52
36
33
23
29
7
23
52
45
45
45
45
45
53
53
53
53
41
48
43
43
52
29
29
29
29
23
23
23
22
22
23
23
45
45
45
45
45
53
53
53
41
41
43
29
29
29
29
29
23
23
23
23
22
22
23
23
23
45
45
45
45
45
53
53
53
4
4
4
4
4
3
3
3
3
3
4
15
15
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
3
3
4
15
9
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
4
3
15
15
15
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
4
5
15
15
15
15
15
15
15
15
15
15
15
9
9
9
10
10
10
10
4
4
4
4
4
4
5
5
15
15
15
15
15
15
15
15
15
15
15
15
9
9
9
10
10
10
4
4
4
4
4
4
5
5
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
4
4
4
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
4
4
15
15
15
15
15
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
2
2
2
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
2
2
13
15
15
15
15
15
15
15
15
15
15
14
14
15
15
15
15
15
15
15
15
15
15
15
2
13
13
15
15
15
15
15
15
15
15
15
15
15
14
15
15
15
15
15
15
15
15
15
15
15
12
12
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
6
15
6
15
15
15
1
1
15
1
12
13
15
15
15
15
15
15
15
15
15
15
6
6
6
6
6
6
6
6
1
1
1
1
1
12
12
13
15
15
15
15
15
15
15
15
15
15
15
15
6
6
6
6
6
6
6
1
1
1
1
12
13
15
15
15
15
15
15
9
8
15
15
8
15
6
6
6
6
6
1
1
1
1
11
15
15
15
15
15
15
15
15
8
8
8
7
7
8
8
6
6
6
6
6
1
1
1
11
11
15
15
15
15
15
15
8
8
8
8
7
7
8
8
8
6
6
6
6
6
1
1
1
suggested groups of tissues
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1

AMALEA 2022 55
different tissues have different RP mRNA signatures:
if majority<50%
if majority <50%

AMALEA 2022
main findings: (only very few presented here, see NAR paper)
summary/conclusion

AMALEA 2022
RP mRNA signatures vary with
Ÿ tissue type Ÿ tumor type and sub-type Ÿ developmental stage
RP translation rates are proportional to RP mRNA levels
RP mRNA and profiling different in cell cultures vs. cells in-vivo
summary/conclusion

AMALEA 2022
speculative (yet plausible) conclusions:
RP composition and function (?)
Ÿ is tissue-, tumor-, development-, environment-specific
Ÿ adds a novel layer to the regulatory network of the cell
Ÿ might play an important role in cancer
summary/conclusion

AMALEA 2022
speculative (yet plausible) conclusions:
RP composition and function (?)
Ÿ is tissue-, tumor-, development-, environment-specific
Ÿ adds a novel layer to the regulatory network of the cell
Ÿ might play an important role in cancer
caveats: composition could be independent of RP abundance
possible extra-ribosomal functions of RP
direct inspection of ribosome is difficult
summary/conclusion

AMALEA 2022
supervised learning
classification / regression / prediction
based on labeled example data

AMALEA 2022
supervised learning
generic workflow:
example data model apply to novel data
training working

AMALEA 2022
supervised learning
generic workflow:
training working
validation
estimate working performance
set parameters of model / training
compare different models

AMALEA 2022
supervised learning
generic workflow:
training working
obvious performance measures: overall / class-wise accuracy
ROC, Precision Recall ...
validation

AMALEA 2022
21
supervised learning
generic workflow:
training working
obvious performance measures: overall / class-wise accuracy
ROC, Precision Recall ...
validation
accuracy is not enough - interpretable “white-box” systems
example: prototype-based models, distance-based classifiers

AMALEA 2022
distance-based classifiers
a simple distance-based system: NN classifier
store a set of labeled examples
N-dim. feature space

AMALEA 2022
classify a query according to the
label of the Nearest Neighbor
in the data set
?

AMALEA 2022
in the data set
piece-wise linear decision
boundaries according to e.g.
(squared) Euclidean distance:
?
d(xµ
, x⌫
) =
N
X
j=1
xµ
j x⌫
j
2

AMALEA 2022
in the data set
piece-wise linear decision
boundaries according to e.g.
(squared) Euclidean distance:
?
+ conceptually simple,
+ no training phase
- expensive (storage, computation)
- sensitive to mislabeled data
- overly complex decision boundaries
d(xµ
, x⌫
) =
N
X
j=1
xµ
j x⌫
j
2

AMALEA 2022
prototype based classification
a prototype based classifier [Kohonen 1990]
represent the data by one or
several prototypes per class

AMALEA 2022
label of the nearest prototype
(or alternative schemes)
local decision boundaries
acc. to Euclidean distances
from the prototypes
piece-wise linear class borders
parameterized by prototypes

AMALEA 2022
from the prototypes
+
less sensitive to outliers, lower storage needs,
little computational effort in the working phase

AMALEA 2022
from the prototypes
+
less sensitive to outliers, lower storage needs,
little computational effort in the working phase
-
training phase required in order to place prototypes,
model selection problem: number of prototypes per class etc.

AMALEA 2022
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors

AMALEA 2022
competitive learning: heuristic LVQ1 [Kohonen, 1990]

AMALEA 2022
• initialize prototype vectors for each class

AMALEA 2022
• present a single example

AMALEA 2022
• identify the winner (closest prototype)

AMALEA 2022
• move the winner
- closer towards the data (same class)

AMALEA 2022
• move the winner
- away from the data (different class)

AMALEA 2022
• move the winner
- away from the data (different class)
• many variants, including
cost-function-based schemes, e.g.
Generalized LVQ (approximates # of misclassifications)

AMALEA 2022
∙ distance based classification (e.g. squared Euclidean)
∙ tesselation of feature space
[piece-wise linear]
∙ distance-based classification
[here: Euclidean distances]
∙ aim: generalization ability
correct classification of new data

AMALEA 2022
LVQ distance measures
? key question: appropriate distance / (dis-) similarity measure
fixed, pre-defined distance measures:
(G)LVQ can formulated for general (differentiable) distances

AMALEA 2022
examples: Minkowski distances (p≠2), correlation based,
statistical divergences, ... not necessarily metrics!

AMALEA 2022
standard work-flow
- consider several distance measures
- compare performances in, e.g., cross-validation

AMALEA 2022
standard work-flow
- consider several distance measures
- compare performances in, e.g., cross-validation
elegant approach:
Relevance Learning / adaptive distances
- employ parameterized distance measure
- optimize in the data-driven training process (cost function!)

AMALEA 2022
Generalized Matrix Relevance LVQ
generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009]
d(w, x) = (w x)
>
⇤ (w x)
(GMLVQ)

AMALEA 2022
Generalized Matrix Relevance LVQ
d(w, x) = (w x)
>
⇤ (w x)
(GMLVQ)
= [ ⌦ (w x) ]
2

AMALEA 2022
GMLVQ
training: adaptation of prototypes
and distance measure guided by
GLVQ cost function
= [ ⌦ (w x) ]
2
d(w, x) = (w x)
>
⇤ (w x)
Generalized Matrix Relevance LVQ:

AMALEA 2022
GMLVQ
variants:
one global, several local, class-wise relevance matrices
rectangular low-dim. representation / visualization
[Bunte et al., 2012]
diagonal matrices: single feature weights [Hammer et al., 2002]
training: adaptation of prototypes
and distance measure guided by
GLVQ cost function
= [ ⌦ (w x) ]
2
d(w, x) = (w x)
>
⇤ (w x)
Generalized Matrix Relevance LVQ:

AMALEA 2022
But this is just Mahalonobis distance…
No.

AMALEA 2022
[Mahalonobis, 1936]
S covariance matrix of random vectors
(calculated once from the data, fixed definition, not adaptive)
x 2 RN
(‘two point version’)
No.
dM (x, y) =
q
(x y)> S 1 (x y)

AMALEA 2022
[Mahalonobis, 1936]
x 2 RN
if you insist…
So it is a generalized Mahalonobis distance ?
No.
dM (x, y) =
q
(x y)> S 1 (x y)

AMALEA 2022
[Mahalonobis, 1936]
x 2 RN
if you insist…
So it is a generalized Mahalonobis distance ?
No.
a generalized
broccoli
E = ~!
a generalization
of Ohm’s Law
dM (x, y) =
q
(x y)> S 1 (x y)

AMALEA 2022 98
interpretation
after training:
prototypes represent typical class properties or subtypes (hope)

AMALEA 2022 99
interpretation
summarizes
• the contribution of a single dimension
• the relevance of original features in the classifier
⇤ij
quantifies the contribution of the pair
of features (i,j) to the distance
after training:
Relevance Matrix

AMALEA 2022 100
interpretation
summarizes
• the contribution of a single dimension
• the relevance of original features in the classifier
Note: interpretation assumes implicitly that
features have equal order of magnitude
e.g. after z-score-transformation →
(averages over data set)
⇤ij
quantifies the contribution of the pair
of features (i,j) to the distance
after training:
Relevance Matrix

Urine Steroid Metabolomics as a Biomarker Tool for
Detecting Malignancy in Patients with Adrenal Tumors
www.ensat.org
W. Arlt, M. Biehl, A. Taylor, S. Hahner, R. Libé, B. Hughes, P. Schneider,
D. Smith, H. Stiekema, N. Krone, E. Porfiri, G. Opocher, J. Bertherat,
F. Mantero, B. Allolio, M. Terzolo, P. Nightingale, C. Shackleton,
X. Bertagna, M.Fassnacht, P. Stewart
J Clinical Endocrinology & Metabolism 96: 3775-3784 (2011)
tumor classification

Urine Steroid Metabolomics as a Biomarker Tool for
Detecting Malignancy in Patients with Adrenal Tumors
www.ensat.org
W. Arlt, M. Biehl, A. Taylor, S. Hahner, R. Libé, B. Hughes, P. Schneider,
D. Smith, H. Stiekema, N. Krone, E. Porfiri, G. Opocher, J. Bertherat,
F. Mantero, B. Allolio, M. Terzolo, P. Nightingale, C. Shackleton,
X. Bertagna, M.Fassnacht, P. Stewart
J Clinical Endocrinology & Metabolism 96: 3775-3784 (2011)
tumor classification
insight: marker selection, patented diagnosis tool
follow-up: recurrence detection, other disorders, tumor sub-types...

AMALEA 2022 103
two recent application examples

AMALEA 2022 104
I) cytokine expression data:
- insights into disease mechanisms of (early) rheumatoid arthritis
based on synovial tissue samples
~ 50 samples represented by 117 cytokine expressions
in synovial tissue, PCA+GMLVQ combined

AMALEA 2022 105
I) cytokine expression data:
- insights into disease mechanisms of (early) rheumatoid arthritis
based on synovial tissue samples
~ 50 samples represented by 117 cytokine expressions
in synovial tissue, PCA+GMLVQ combined
II) FDG-PET brain scans:
- ultimate goal: diagnosis of neurodegenerative diseases
~ 100 samples, ~200000 voxels per scan
SSM/PCA+GMLVQ combined

Early diagnosis (?) of Rheumatoid Arthritis
Expression of chemokines CXCL4 and CXCL7 by synovial
macrophages defines an early stage of rheumatoid arthritis
Annals of the Rheumatic Diseases 75:763-771 (2016)
L. Yeo, N. Adlard, M. Biehl, M. Juarez, M. Snow
C.D. Buckley, A. Filer, K. Raza, D. Scheel-Toellner

AMALEA 2022 34
Rheumatoid Arthritis
Rheumatoid Arthritis (RA)
- chronicle inflammatory disease
- immune system affects joints
- RA leads to deformation and disability

AMALEA 2022
uninflamed control established RA early inflammation
rheumatoid arthritis (RA)

AMALEA 2022
resolving early RA

AMALEA 2022
resolving early RA
ultimate goals:
understand pathogenesis and
mechanism of progression

AMALEA 2022
resolving early RA
cytokine based diagnosis of RA
at earliest possible stage ?
ultimate goals:
understand pathogenesis and
mechanism of progression

AMALEA 2022
mRNA extraction real-time PCR
tissue section
synovium
synovial tissue cytokine expression

AMALEA 2022
mRNA extraction real-time PCR
tissue section
synovium
synovial tissue cytokine expression
IL1A IL17F FASL CXCL4 CCL15 TGFB1 KITLG
IL1B IL18 CD70 CXCL5 CCL16 TGFB2 MST1
IL1RN IL19 CD30L CXCL6 CCL17 TGFB3 SPP1
IL2 IL20 4-1BB-L CXCL7 CCL18 EGF SFRP1
IL3 IL21 TRAIL CXCL9 CCL19 FGF2 ANXA1
IL4 IL22 RANKL CXCL10 CCL20 TGFA TNFRSF13B
IL5 IL23A TWEAK CXCL11 CCL21 IGF2 IL6R
IL6 IL24 APRIL CXCL12 CCL22 VEGFA NAMPT
IL7 IL25 BAFF CXCL13 CCL23 VEGFB C1QTNF3
IL8 IL26 LIGHT CXCL14 CCL24 MIF VCAM1
IL9 IL27 TL1A CXCL16 CCL25 LIF LGALS1
IL10 IL28A GITRL CCL1 CCL26 OSM LGALS9
IL11 IL29 FASLG CCL2 CCL27 ADIPOQ LGALS3
IL12A IL32 IFNA1 CCL3 CCL28 LEP LGALS12
IL12B IL33 IFNA2 CCL4 XCL1 GHRL
IL13 LTA IFNB1 CCL5 XCL2 RETN
IL14 TNF IFNG CCL7 CX3CL1 CTLA4
IL15 LTB CXCL1 CCL8 CSF1 EPO
IL16 OX40L CXCL2 CCL11 CSF2 TPO
IL17A CD40L CXCL3 CCL13 CSF3 FLT3LG
panel of 117 cytokines
• cell signaling proteins
• regulate immune response
• produced by, e.g.
T-cells, macrophages,
lymphocytes, fibroblasts, etc.

AMALEA 2022
GMLVQ analysis
pre-processing
• log-transformed expression values
• 21 leading principal components explain 95% of the variation
x 2 I
R117
, x = e
x 2 I
R21

AMALEA 2022
GMLVQ analysis
pre-processing
Two binary problems: (A) established RA vs. uninflamed controls (!)
(B) early RA vs. resolving inflammation (")
• 1 prototype per class, global relevance matrix, distance measure:
x 2 I
R117
, x = e
x 2 I
R21
d(e
x, e
w) = (e
x e
w)
> e
⇤ (e
x e
w) = (x w)
> > e
⇤
| {z }
⇤
(x w)

AMALEA 2022
GMLVQ analysis
pre-processing
x 2 I
R117
, x = e
x 2 I
R21
d(e
x, e
w) = (e
x e
w)
> e
⇤ (e
x e
w) = (x w)
> > e
⇤
| {z }
⇤
(x w)
d(e
x, e
w) = (e
x e
w)
> e
⇤ (e
x e
w) = (x w)
> > e
⇤
| {z }
⇤
(x w)
in
RN

AMALEA 2022
GMLVQ analysis
pre-processing
• leave-two-out validation (one from each class)
evaluation in terms of Receiver Operating Characteristics
x 2 I
R117
, x = e
x 2 I
R21
d(e
x, e
w) = (e
x e
w)
> e
⇤ (e
x e
w) = (x w)
> > e
⇤
| {z }
⇤
(x w)
d(e
x, e
w) = (e
x e
w)
> e
⇤ (e
x e
w) = (x w)
> > e
⇤
| {z }
⇤
(x w)
in
RN

AMALEA 2022
false positive rate
true
positive
rate
diagonal Λii vs. cytokine index i
(A) established RA vs.
uninflamed control
Relevances
diagonal relevances
leave-one-out

AMALEA 2022
false positive rate
true
positive
rate
t
rue
positive
rate
uninflamed control
(B) early RA vs.
resolving inflammation
Relevances
diagonal relevances
leave-one-out

AMALEA 2022
CXCL4 chemokine (C-X-C motif) ligand 4
protein level studies

AMALEA 2022
direct study on protein level, staining / imaging of sinovial tissue:
macrophages : predominant source of CXCL4/7 expression

AMALEA 2022
direct study on protein level, staining / imaging of sinovial tissue:
macrophages : predominant source of CXCL4/7 expression
• high levels of CXCL4 and
CXLC7 in early RA
• expression on macrophages
outside of blood vessels
discriminates
early RA / resolving cases

AMALEA 2022
false positive rate
true
positive
rate
t
rue
positive
rate
uninflamed control
(B) early RA vs.
resolving inflammation
relevant cytokines
macrophage
stimulating 1
diagonal relevances
leave-one-out

Machine learning analysis of FDG-PET
brain images for the diagnosis of
neurodegenerative disorders
K.L. Leenders, S. Meles, … UMCG Groningen, Neurology
R. van Veen, S. Lövdal Bernoulli Institute, Computer Science
…

AMALEA 2022 42
Glucose
uptake
http://glimpsproject.com
subjects
A
B
C
FDG-PET 3D images
Fluorodeoxyglucose
positron emission tomography
data

AMALEA 2022 42
Glucose
uptake
subjects
A
B
C
FDG-PET 3D images
Fluorodeoxyglucose
Healthy Controls HC
Parkinson’s Disease PD
Alzheimer’s Disease AD
data

AMALEA 2022 42
Subjects
Source HC PD AD
CUN 19 49 -
UGOSM 44 58 55
UMCG 19 20 21
FDG-PET brain scans from 3 centers
• Clínica Universidad de Navarra
• Univ. Genova/IRCCS San Martino
• Univ. Medical Center Groningen
Glucose
uptake
subjects
A
B
C
FDG-PET 3D images
Fluorodeoxyglucose
Healthy Controls HC
Parkinson’s Disease PD
Alzheimer’s Disease AD
data

AMALEA 2022 43
work flow
subjects
~
200000
voxels

AMALEA 2022 43
work flow
subjects
~
200000
voxels
subject specific
anatomy
high intensity,
low noise voxels
log-transform
double centering
masking (*)
low-dimensional
projections (*) details of pre-processing:
D. Mudali et al.
Computational and Mathematical Methods in Medicine.
March 2015, Art.ID 136921 and references. therein
(*) Scaled Subprofile Model / PCA based
on a disjoint reference group of subjects

AMALEA 2022 43
work flow
subjects
~
200000
voxels
subject specific
anatomy
high intensity,
low noise voxels
log-transform
double centering
masking (*)
low-dimensional
projections (*)
subject
socres
subjects
details of pre-processing:
D. Mudali et al.
Computational and Mathematical Methods in Medicine.
March 2015, Art.ID 136921 and references. therein
(*) Scaled Subprofile Model / PCA based
on a disjoint reference group of subjects

AMALEA 2022 44
work flow
subjects
subject
socres
subjects
labels
(condition)
classification:
GMLVQ, SVM
~
200000
voxels

AMALEA 2022 44
work flow
subjects
subject
socres
subjects
applied to
novel subject
test
labels
(condition)
classification:
GMLVQ, SVM
?
~
200000
voxels

AMALEA 2022 45
(A) Perceptron of optimal stability (aka “SVM with linear kernel”)
- linear threshold classifier
- large margin (with errors)
two classifiers

AMALEA 2022 45
two classifiers
(B) Learning Vector Quantization (Generalized Matrix LVQ)
- prototype- and distance-based classifier
- relevance learning

AMALEA 2022 45
performance evaluation:
averages over 10 randomized runs of 10-fold cross-validation
accuracies, sensitivity /specificity
Receiver Operating Characteristics for binary classification
both classifiers outperformed Decision Trees in previous projects
two classifiers
(B) Learning Vector Quantization (Generalized Matrix LVQ)
- prototype- and distance-based classifier
- relevance learning

AMALEA 2022 46
results
subjects from one center only
here: UGOSM unbiased classifiers ROC
±0.008

AMALEA 2022 46
results
here: UGOSM unbiased classifiers ROC
relatively good within-center performance
also in three-class settings
±0.008

AMALEA 2022 47
prototypes back-projected

AMALEA 2022 48
results
here: UGOSM, PD vs. AD unbiased classifiers ROC

AMALEA 2022 48
results
PD vs. AD
subjects from centers combined for training and testing
example: UMCG and UGOSM

AMALEA 2022 48
results
PD vs. AD
subjects from centers combined for training and testing
example: UMCG and UGOSM
reasonable (yet lower) overall performance
also in the other classification problems

AMALEA 2022 49
here: PD vs. HC
unbiased classifiers ROC
within center
(example: UGMOS)
results

AMALEA 2022 49
here: PD vs. HC
unbiased classifiers ROC
within center
(example: UGMOS)
across centers: poor performance
results

AMALEA 2022
50
UMCG vs UGOSM
experiment - classify subjects according to medical center
here: AD patients only
results/conclusions

AMALEA 2022
50
UMCG vs UGOSM
experiment - classify subjects according to medical center
here: AD patients only
results/conclusions
possible explanations:
- center-specific (pre-)processing
despite identical equipment and work flows
- significantly different patient cohorts (not the case)
need for more consistent protocols, calibration / pre-processing
aim: unified classifiers with good inter-center performance

AMALEA 2022 51
Matlab:
K Bunte: Relevance and Matrix adaptation in Learning Vector
Quantization (GRLVQ, GMLVQ and LiRaM LVQ)
-> code
F Westerman, R Veen, M.B: A no-nonsense beginners’ tool for GMLVQ
http://www.cs.rug.nl/~biehl/gmlvq
sklvq: Scikit Learning Vector Quantization
R van Veen, GJ de Vries, M. Biehl, JMLR 22 (2021), 1-6
https://www.cs.rug.nl/~biehl/
CITEC Bielefeld: scikit-learn compatible LVQ implementations
from the machine learning group at CITEC Bielefeld:
Java:
plug-in for WEKA from the CI Group Mittweida
M. Kästner, T. Villmann
Python:

prototypes-AMALEA.pdf

Recommended

Recommended

More Related Content

Similar to prototypes-AMALEA.pdf

Similar to prototypes-AMALEA.pdf (20)

More from University of Groningen

More from University of Groningen (20)

Recently uploaded

Recently uploaded (20)

prototypes-AMALEA.pdf