SlideShare a Scribd company logo
Michael Biehl
Bernoulli Institute for Mathematics,
Computer Science and Artificial Intelligence
University of Groningen
The Netherlands
www.cs.rug.nl/~biehl
Prototype-based machine learning:
bio-medical applications
AMALEA 2022 2
review: WIRES Cognitive Science (2016)
AMALEA 2022 3
AMALEA 2022 4
overview
1. Introduction / Motivation
prototypes and exemplars, neural activation / learning
AMALEA 2022 4
overview
1. Introduction / Motivation
prototypes and exemplars, neural activation / learning
2. Unsupervised Learning
Competitive Learning
Kohonen’s Self-Organizing Map (SOM)
AMALEA 2022 4
overview
1. Introduction / Motivation
prototypes and exemplars, neural activation / learning
3. Supervised Learning
Learning Vector Quantization (LVQ)
Adaptive distances and Relevance Learning
2. Unsupervised Learning
Competitive Learning
Kohonen’s Self-Organizing Map (SOM)
AMALEA 2022 4
overview
1. Introduction / Motivation
prototypes and exemplars, neural activation / learning
3. Supervised Learning
Learning Vector Quantization (LVQ)
Adaptive distances and Relevance Learning
2. Unsupervised Learning
Competitive Learning
Kohonen’s Self-Organizing Map (SOM)
Examples and illustrations : Bio-medical applications
- clustering of proteomics data
- biomarkers for rheumatoid arthritis
(- FDG-Pet brain scans)
AMALEA 2022 5
1. Introduction
prototypes, exemplars:
representation of information in terms of
typical representatives (e.g. of a class of objects),
much debated concept in cognitive psychology
AMALEA 2022 5
1. Introduction
prototypes, exemplars:
representation of information in terms of
typical representatives (e.g. of a class of objects),
much debated concept in cognitive psychology
AMALEA 2022 5
1. Introduction
prototypes, exemplars:
representation of information in terms of
typical representatives (e.g. of a class of objects),
much debated concept in cognitive psychology
AMALEA 2022 5
1. Introduction
prototypes, exemplars:
representation of information in terms of
typical representatives (e.g. of a class of objects),
much debated concept in cognitive psychology
neural activation:
external stimulus to a network of neurons
response acc. to weights (expected inputs)
AMALEA 2022 5
1. Introduction
prototypes, exemplars:
representation of information in terms of
typical representatives (e.g. of a class of objects),
much debated concept in cognitive psychology
neural activation:
external stimulus to a network of neurons
response acc. to weights (expected inputs)
best matching unit (and neighbors)
weights represent different expected stimuli (prototypes)
AMALEA 2022 5
1. Introduction
prototypes, exemplars:
representation of information in terms of
typical representatives (e.g. of a class of objects),
much debated concept in cognitive psychology
neural activation:
external stimulus to a network of neurons
response acc. to weights (expected inputs)
best matching unit (and neighbors)
weights represent different expected stimuli (prototypes)
learning: change of weights result in even stronger
response to similar stimuli in the future
AMALEA 2022 6
even independent from the above:
attractive framework for machine learning based data analysis
- trained system is parameterized in the feature space
- facilitates discussions with domain experts
- transparent (white box) and provides insights into the
applied criteria (classification, regression, clustering etc.)
- easy to implement, efficient computation
- versatile, successfully applied in many different application areas
AMALEA 2022 7
2. Unsupervised Learning
Potential aims:
dimension reduction: compression, visualization, ...
exploration of data structure: clustering, density estimation, ...
pre-processing: supvervised learning, classification, regression,...
AMALEA 2022 7
2. Unsupervised Learning
Potential aims:
dimension reduction: compression, visualization, ...
exploration of data structure: clustering, density estimation, ...
pre-processing: supvervised learning, classification, regression,...
Vector Quantization: identify (few) typical representatives
from a set of feature vectors
w1
, w2
, . . . , wK
wk
2 I
RN
x1
, x2
, . . . , xP
xµ
2 I
RN
AMALEA 2022 7
2. Unsupervised Learning
Potential aims:
dimension reduction: compression, visualization, ...
exploration of data structure: clustering, density estimation, ...
pre-processing: supvervised learning, classification, regression,...
Vector Quantization: identify (few) typical representatives
from a set of feature vectors
w1
, w2
, . . . , wK
wk
2 I
RN
x1
, x2
, . . . , xP
xµ
2 I
RN
assign xµ to winning prototype w⇤
= argminj d(wj
, xµ
)
d(w, x) =
N
X
n=1
(wn xn)
2
for instance w.r.t. :
squared Euclidean distance
AMALEA 2022 8
, random sequence of single data:
… the winner takes it all:
initially: randomized wk
competitive learning
competition for updates
learning rate / step size η <1
⌘ (xµ
w⇤
)
w⇤
! w⇤
+ ⌘ (xµ
w⇤
)
AMALEA 2022 8
, random sequence of single data:
… the winner takes it all:
initially: randomized wk
competitive learning
competition for updates
learning rate / step size η <1
⌘ (xµ
w⇤
)
w⇤
! w⇤
+ ⌘ (xµ
w⇤
)
AMALEA 2022 8
, random sequence of single data:
… the winner takes it all:
initially: randomized wk
competitive learning
competition for updates
learning rate / step size η <1
⌘ (xµ
w⇤
)
w⇤
! w⇤
+ ⌘ (xµ
w⇤
)
competitive VQ = stochastic gradient descent w.r.t. Quantization Error
- assign each data to closest prototype
- measure the corresponding distance (e.g. squared Euclidean)
- sum over all assigned data points
measures the quality of the representation
defines a (one possible) criterion to evaluate / compare
the quality of different prototype configurations
{
QE
AMALEA 2022 9
data
initial
prototypes
competitive learning
AMALEA 2022 9
data
initial
prototypes
dead
units
WTA training
competitive learning
AMALEA 2022 9
data
initial
prototypes
dead
units
WTA training
competitive learning
AMALEA 2022 9
data
initial
prototypes
dead
units
WTA training
general problem: local minima of the quantization error,
initialization-dependent outcome of training
competitive learning
improvement: rank-based updates (winner, second, third,… )
AMALEA 2022 9
data
initial
prototypes
dead
units
WTA training
general problem: local minima of the quantization error,
initialization-dependent outcome of training
competitive learning
improvement: rank-based updates (winner, second, third,… )
introduce rank-based neighborhood cooperativeness
[Martinetz, Berkovich, Schulten, IEEE Trans. Neural Netw. 1993]
Neural Gas: many prototypes to represent the density of data
AMALEA 2022
Self-Organizing Map
T. Kohonen. Self-Organizing Maps. Springer (1995)
neighborhood cooperativeness on a predefined low-dim. lattice
AMALEA 2022
Self-Organizing Map
T. Kohonen. Self-Organizing Maps. Springer (1995)
neighborhood cooperativeness on a predefined low-dim. lattice
lattice A of neurons
i.e. prototypes
wr 2 I
RN
at r 2 I
Rd
A
AMALEA 2022
Self-Organizing Map
T. Kohonen. Self-Organizing Maps. Springer (1995)
neighborhood cooperativeness on a predefined low-dim. lattice
lattice A of neurons
i.e. prototypes
ws
wr 2 I
RN
at r 2 I
Rd
upon presentation of xµ :
- determine the winner (best matching unit)
in feature space: (at position s in A)
A
AMALEA 2022
Self-Organizing Map
T. Kohonen. Self-Organizing Maps. Springer (1995)
neighborhood cooperativeness on a predefined low-dim. lattice
lattice A of neurons
i.e. prototypes
- update winner and lattice neighborhood:
where
range ρ w.r.t. distances in lattice A
ws
wr 2 I
RN
at r 2 I
Rd
h⇢(r, s) = exp
✓
|| r s ||2
A
2⇢2
◆
upon presentation of xµ :
- determine the winner (best matching unit)
in feature space: (at position s in A)
<latexit sha1_base64="P/AKHBol8ZLEa4RgJDSz5KstNjs=">AAACY3icbVHLSgMxFM2MWrU+Oj52KgRFaLGWGVF0KbhxWcFaoVOHTJrphGYeJBm1DrP0i9z6Fe7EjRv/w0wfUqsXkns491zuzYkbMyqkab5r+szsXGF+YbG4tLyyWjLW1m9ElHBMGjhiEb91kSCMhqQhqWTkNuYEBS4jTbd3kdeb94QLGoXXsh+TdoC6IfUoRlJRjvFkB0j6rpc+ZA6HtozgiIAjpgoP8ssmEuXZd2zuR+VxF8+qYyiyihLkUpd2fwSP2Z0dJPAQTs7JFRXoGHtmzRwE/AusEdg73/aeP+5fXuuO8WZ3IpwEJJSYISFalhnLdoq4pJiRrGgngsQI91CXtBQMUUBEOx14lMF9xXSgF3F1QgkH7GRHigIh+oGrlPmmYrqWk//VWon0ztopDeNEkhAPB3kJg8rK3HDYoZxgyfoKIMyp2hViH3GEpfqWojLBmn7yX3BzVLNOauaVcuMYDGMBbIFdUAYWOAXn4BLUQQNg8KkVtJJmaF/6kr6ubw6lujbq2QC/Qt/5BtYMuc8=</latexit>
wr ! wr + ⌘ h⇢(r, s) xµ
wr
A
AMALEA 2022 11
- lattice deforms in , reflecting the density of observations
© Wikipedia
Self-Organizing Map
<latexit sha1_base64="S6bznRCRGpJCx80D30CfTaQNebg=">AAAB83icbVDLSsNAFL2pr1pfUZduBovgqiSi6M6CG1dSxT6giWUynbRDJ5MwMxFK6G8I6kIRt36Av+HOv3HSdqGtBwYO59zLPXOChDOlHefbKiwsLi2vFFdLa+sbm1v29k5DxakktE5iHstWgBXlTNC6ZprTViIpjgJOm8HgIveb91QqFotbPUyoH+GeYCEjWBvJ8yKs+0GQ3Yzurjp22ak4Y6B54k5J+fzzMcdTrWN/ed2YpBEVmnCsVNt1Eu1nWGpGOB2VvFTRBJMB7tG2oQJHVPnZOPMIHRili8JYmic0Gqu/NzIcKTWMAjOZZ1SzXi7+57VTHZ75GRNJqqkgk0NhypGOUV4A6jJJieZDQzCRzGRFpI8lJtrUVDIluLNfnieNo4p7UnGunXL1GCYowh7swyG4cApVuIQa1IFAAg/wAq9Waj1bb9b7ZLRgTXd24Q+sjx8yWJYz</latexit>
RN
AMALEA 2022 11
- lattice deforms in , reflecting the density of observations
© Wikipedia
SOM provides topology preserving low-dim representation
for inspection and visualization of structured datasets
Self-Organizing Map
<latexit sha1_base64="S6bznRCRGpJCx80D30CfTaQNebg=">AAAB83icbVDLSsNAFL2pr1pfUZduBovgqiSi6M6CG1dSxT6giWUynbRDJ5MwMxFK6G8I6kIRt36Av+HOv3HSdqGtBwYO59zLPXOChDOlHefbKiwsLi2vFFdLa+sbm1v29k5DxakktE5iHstWgBXlTNC6ZprTViIpjgJOm8HgIveb91QqFotbPUyoH+GeYCEjWBvJ8yKs+0GQ3Yzurjp22ak4Y6B54k5J+fzzMcdTrWN/ed2YpBEVmnCsVNt1Eu1nWGpGOB2VvFTRBJMB7tG2oQJHVPnZOPMIHRili8JYmic0Gqu/NzIcKTWMAjOZZ1SzXi7+57VTHZ75GRNJqqkgk0NhypGOUV4A6jJJieZDQzCRzGRFpI8lJtrUVDIluLNfnieNo4p7UnGunXL1GCYowh7swyG4cApVuIQa1IFAAg/wAq9Waj1bb9b7ZLRgTXd24Q+sjx8yWJYz</latexit>
RN
AMALEA 2022 11
- lattice deforms in , reflecting the density of observations
© Wikipedia
SOM provides topology preserving low-dim representation
for inspection and visualization of structured datasets
Self-Organizing Map
e.g. SOM toolbox (Matlab) Helsinki University of Technology
http://www.cis.hut.fi/somtoolbox/
- select lattice type, size and shape (PCA-based defaults)
- select training prescription, training time etc.
- compute quantization error etc.
- visualize the SOM grid, distances between prototypes ...
<latexit sha1_base64="S6bznRCRGpJCx80D30CfTaQNebg=">AAAB83icbVDLSsNAFL2pr1pfUZduBovgqiSi6M6CG1dSxT6giWUynbRDJ5MwMxFK6G8I6kIRt36Av+HOv3HSdqGtBwYO59zLPXOChDOlHefbKiwsLi2vFFdLa+sbm1v29k5DxakktE5iHstWgBXlTNC6ZprTViIpjgJOm8HgIveb91QqFotbPUyoH+GeYCEjWBvJ8yKs+0GQ3Yzurjp22ak4Y6B54k5J+fzzMcdTrWN/ed2YpBEVmnCsVNt1Eu1nWGpGOB2VvFTRBJMB7tG2oQJHVPnZOPMIHRili8JYmic0Gqu/NzIcKTWMAjOZZ1SzXi7+57VTHZ75GRNJqqkgk0NhypGOUV4A6jJJieZDQzCRzGRFpI8lJtrUVDIluLNfnieNo4p7UnGunXL1GCYowh7swyG4cApVuIQa1IFAAg/wAq9Waj1bb9b7ZLRgTXd24Q+sjx8yWJYz</latexit>
RN
AMALEA 2022 12
- many extensions of the basic concept, e.g.
cost function based SOM [Heskes]
Generative Topographic Map (GTM), probabilistic
formulation of the mapping to low-dim. lattice
[Bishop, Svensen, Williams, 1998]
specific modifications of SOM or Neural Gas for
- time series / functional data
- “non-vectorial” relational data
- graphs and trees
- supervised learning
Remarks
AMALEA 2022 13
example application
AMALEA 2022
the “central dogma” of molecular biology
AMALEA 2022
transcription:
DNA è (m)RNA
the “central dogma” of molecular biology
AMALEA 2022
transcription:
DNA è (m)RNA
translation
mRNA è
proteins
the “central dogma” of molecular biology
AMALEA 2022
transcription:
DNA è (m)RNA
translation
mRNA è
proteins
the “central dogma” of molecular biology
protein,è
function
AMALEA 2022
transcription:
DNA è (m)RNA
translation
mRNA è
proteins
the “central dogma” of molecular biology
Ribosome
protein,è
function
AMALEA 2022
• is an ancient molecular machine, ‘3D-printer’ for proteins
the ribosome…
AMALEA 2022
• is an ancient molecular machine, ‘3D-printer’ for proteins
• ~ 107 cytoplasmic ribosomes per cell
the ribosome…
AMALEA 2022
• is an ancient molecular machine, ‘3D-printer’ for proteins
• ~ 107 cytoplasmic ribosomes per cell
• is believed to have universal function and the same
the ribosome…
AMALEA 2022
• is an ancient molecular machine, ‘3D-printer’ for proteins
• ~ 107 cytoplasmic ribosomes per cell
• is believed to have universal function and the same
composition in different tissues and across species
the ribosome…
AMALEA 2022
• is an ancient molecular machine, ‘3D-printer’ for proteins
• ~ 107 cytoplasmic ribosomes per cell
• is believed to have universal function and the same
composition in different tissues and across species
• consists of RNA and
the ribosome…
AMALEA 2022
• is an ancient molecular machine, ‘3D-printer’ for proteins
• ~ 107 cytoplasmic ribosomes per cell
• is believed to have universal function and the same
composition in different tissues and across species
• consists of RNA and
ribosomal proteins (RP)
the ribosome…
AMALEA 2022
• is an ancient molecular machine, ‘3D-printer’ for proteins
• ~ 107 cytoplasmic ribosomes per cell
• is believed to have universal function and the same
composition in different tissues and across species
• consists of RNA and
ribosomal proteins (RP)
the ribosome…
also coded by DNA which
is transcribed to mRNA
AMALEA 2022
• is an ancient molecular machine, ‘3D-printer’ for proteins
• ~ 107 cytoplasmic ribosomes per cell
• is believed to have universal function and the same
composition in different tissues and across species
• consists of RNA and
ribosomal proteins (RP)
the ribosome…
also coded by DNA which
is transcribed to mRNA
here: analysis of
RP mRNA expression
AMALEA 2022
Klijn et al. Nature Biotechnol. 33(3): 306-312 (2015)
675 cell lines
public domain data sets
GTeX (v6p) www.gtexportal.org
8,555 normal samples from 53 different tissues (with >50 samples)
TCGA (NCI-GDC, v7) www.cancer.gov
10363 tumor samples, 730 tumor-adjacent normals
AMALEA 2022
Klijn et al. Nature Biotechnol. 33(3): 306-312 (2015)
675 cell lines
public domain data sets
GTeX (v6p) www.gtexportal.org
8,555 normal samples from 53 different tissues (with >50 samples)
TCGA (NCI-GDC, v7) www.cancer.gov
10363 tumor samples, 730 tumor-adjacent normals
mRNA
expression
AMALEA 2022
Klijn et al. Nature Biotechnol. 33(3): 306-312 (2015)
675 cell lines
public domain data sets
GTeX (v6p) www.gtexportal.org
8,555 normal samples from 53 different tissues (with >50 samples)
TCGA (NCI-GDC, v7) www.cancer.gov
10363 tumor samples, 730 tumor-adjacent normals
normalization:
constant sum of reads
for each of the 78 RP
depending on method:
log-transform, z-score
mRNA
expression
AMALEA 2022
normal samples (GTeX)
whole blood - brain tissues - rest
different tissues have different RP mRNA signatures: PCA
AMALEA 2022 18
9
13
14
13
14
11
12
11
11
11
10
6
6
6
6
6
6
6
30
26
32
33
33
33
38
38
38
8
8
16
13
17
12
12
9
39
32
6
6
6
4
4
4
48
32
32
32
33
38
38
38
8
16
15
17
17
17
11
50
1
36
51
6
6
4
4
4
27
32
32
32
33
38
38
38
17
16
16
18
10
17
42
50
50
2
28
26
6
4
4
4
5
6
32
32
33
38
38
38
38
8
20
18
10
18
17
42
42
50
50
26
30
30
26
5
5
5
1
36
36
32
32
33
38
38
38
18
20
20
16
10
15
42
42
50
50
27
26
30
30
30
28
6
6
39
39
36
36
36
1
19
19
20
50
50
50
43
26
30
30
28
51
52
39
39
39
36
36
2
1
1
1
1
19
19
34
50
50
50
50
35
26
30
26
51
51
51
39
39
39
2
21
1
1
1
1
2
49
49
49
35
36
36
50
50
50
50
50
30
30
43
51
51
21
21
39
39
21
2
1
1
1
1
2
49
49
35
36
36
36
37
37
37
43
43
43
43
40
40
52
21
39
39
21
1
1
1
1
2
2
49
35
35
36
36
36
37
48
48
37
36
43
2
24
40
21
21
21
39
5
5
1
1
1
1
1
3
3
35
36
47
47
46
48
48
48
27
36
39
21
2
21
21
44
45
45
45
2
5
53
53
36
53
3
35
47
46
46
27
48
37
36
30
39
21
52
44
44
44
45
44
44
44
53
53
53
53
53
3
3
35
36
27
27
27
27
36
36
36
36
1
29
52
44
45
45
45
44
44
45
53
53
53
53
3
35
43
27
27
27
52
36
33
23
29
7
23
52
45
45
45
45
45
53
53
53
53
41
48
43
43
52
29
29
29
29
23
23
23
22
22
23
23
45
45
45
45
45
53
53
53
41
41
43
29
29
29
29
29
23
23
23
23
22
22
23
23
23
45
45
45
45
45
53
53
53
individual tissue labels
(majority assignments)
normal samples: SOM, labelled post-hoc
AMALEA 2022 18
9
13
14
13
14
11
12
11
11
11
10
6
6
6
6
6
6
6
30
26
32
33
33
33
38
38
38
8
8
16
13
17
12
12
9
39
32
6
6
6
4
4
4
48
32
32
32
33
38
38
38
8
16
15
17
17
17
11
50
1
36
51
6
6
4
4
4
27
32
32
32
33
38
38
38
17
16
16
18
10
17
42
50
50
2
28
26
6
4
4
4
5
6
32
32
33
38
38
38
38
8
20
18
10
18
17
42
42
50
50
26
30
30
26
5
5
5
1
36
36
32
32
33
38
38
38
18
20
20
16
10
15
42
42
50
50
27
26
30
30
30
28
6
6
39
39
36
36
36
1
19
19
20
50
50
50
43
26
30
30
28
51
52
39
39
39
36
36
2
1
1
1
1
19
19
34
50
50
50
50
35
26
30
26
51
51
51
39
39
39
2
21
1
1
1
1
2
49
49
49
35
36
36
50
50
50
50
50
30
30
43
51
51
21
21
39
39
21
2
1
1
1
1
2
49
49
35
36
36
36
37
37
37
43
43
43
43
40
40
52
21
39
39
21
1
1
1
1
2
2
49
35
35
36
36
36
37
48
48
37
36
43
2
24
40
21
21
21
39
5
5
1
1
1
1
1
3
3
35
36
47
47
46
48
48
48
27
36
39
21
2
21
21
44
45
45
45
2
5
53
53
36
53
3
35
47
46
46
27
48
37
36
30
39
21
52
44
44
44
45
44
44
44
53
53
53
53
53
3
3
35
36
27
27
27
27
36
36
36
36
1
29
52
44
45
45
45
44
44
45
53
53
53
53
3
35
43
27
27
27
52
36
33
23
29
7
23
52
45
45
45
45
45
53
53
53
53
41
48
43
43
52
29
29
29
29
23
23
23
22
22
23
23
45
45
45
45
45
53
53
53
41
41
43
29
29
29
29
29
23
23
23
23
22
22
23
23
23
45
45
45
45
45
53
53
53
4
4
4
4
4
3
3
3
3
3
4
15
15
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
3
3
4
15
9
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
4
3
15
15
15
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
4
5
15
15
15
15
15
15
15
15
15
15
15
9
9
9
10
10
10
10
4
4
4
4
4
4
5
5
15
15
15
15
15
15
15
15
15
15
15
15
9
9
9
10
10
10
4
4
4
4
4
4
5
5
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
4
4
4
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
4
4
15
15
15
15
15
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
2
2
2
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
2
2
13
15
15
15
15
15
15
15
15
15
15
14
14
15
15
15
15
15
15
15
15
15
15
15
2
13
13
15
15
15
15
15
15
15
15
15
15
15
14
15
15
15
15
15
15
15
15
15
15
15
12
12
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
6
15
6
15
15
15
1
1
15
1
12
13
15
15
15
15
15
15
15
15
15
15
6
6
6
6
6
6
6
6
1
1
1
1
1
12
12
13
15
15
15
15
15
15
15
15
15
15
15
15
6
6
6
6
6
6
6
1
1
1
1
12
13
15
15
15
15
15
15
9
8
15
15
8
15
6
6
6
6
6
1
1
1
1
11
15
15
15
15
15
15
15
15
8
8
8
7
7
8
8
6
6
6
6
6
1
1
1
11
11
15
15
15
15
15
15
8
8
8
8
7
7
8
8
8
6
6
6
6
6
1
1
1
individual tissue labels
(majority assignments)
suggested groups of tissues
normal samples: SOM, labelled post-hoc
AMALEA 2022 18
rest
ovary
liver
adrenal gland
pancreas
muscle
heart
cells (fibroblasts)
cells (ebv)
skin
pituitary
brain (rest)
cerebellum
testis
blood
9
13
14
13
14
11
12
11
11
11
10
6
6
6
6
6
6
6
30
26
32
33
33
33
38
38
38
8
8
16
13
17
12
12
9
39
32
6
6
6
4
4
4
48
32
32
32
33
38
38
38
8
16
15
17
17
17
11
50
1
36
51
6
6
4
4
4
27
32
32
32
33
38
38
38
17
16
16
18
10
17
42
50
50
2
28
26
6
4
4
4
5
6
32
32
33
38
38
38
38
8
20
18
10
18
17
42
42
50
50
26
30
30
26
5
5
5
1
36
36
32
32
33
38
38
38
18
20
20
16
10
15
42
42
50
50
27
26
30
30
30
28
6
6
39
39
36
36
36
1
19
19
20
50
50
50
43
26
30
30
28
51
52
39
39
39
36
36
2
1
1
1
1
19
19
34
50
50
50
50
35
26
30
26
51
51
51
39
39
39
2
21
1
1
1
1
2
49
49
49
35
36
36
50
50
50
50
50
30
30
43
51
51
21
21
39
39
21
2
1
1
1
1
2
49
49
35
36
36
36
37
37
37
43
43
43
43
40
40
52
21
39
39
21
1
1
1
1
2
2
49
35
35
36
36
36
37
48
48
37
36
43
2
24
40
21
21
21
39
5
5
1
1
1
1
1
3
3
35
36
47
47
46
48
48
48
27
36
39
21
2
21
21
44
45
45
45
2
5
53
53
36
53
3
35
47
46
46
27
48
37
36
30
39
21
52
44
44
44
45
44
44
44
53
53
53
53
53
3
3
35
36
27
27
27
27
36
36
36
36
1
29
52
44
45
45
45
44
44
45
53
53
53
53
3
35
43
27
27
27
52
36
33
23
29
7
23
52
45
45
45
45
45
53
53
53
53
41
48
43
43
52
29
29
29
29
23
23
23
22
22
23
23
45
45
45
45
45
53
53
53
41
41
43
29
29
29
29
29
23
23
23
23
22
22
23
23
23
45
45
45
45
45
53
53
53
4
4
4
4
4
3
3
3
3
3
4
15
15
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
3
3
4
15
9
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
4
3
15
15
15
15
15
15
15
15
15
15
9
9
9
9
10
10
10
4
4
4
4
4
4
5
15
15
15
15
15
15
15
15
15
15
15
9
9
9
10
10
10
10
4
4
4
4
4
4
5
5
15
15
15
15
15
15
15
15
15
15
15
15
9
9
9
10
10
10
4
4
4
4
4
4
5
5
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
4
4
4
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
4
4
15
15
15
15
15
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
2
2
2
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
15
2
2
13
15
15
15
15
15
15
15
15
15
15
14
14
15
15
15
15
15
15
15
15
15
15
15
2
13
13
15
15
15
15
15
15
15
15
15
15
15
14
15
15
15
15
15
15
15
15
15
15
15
12
12
13
15
15
15
15
15
15
15
15
15
15
15
15
15
15
6
15
6
15
15
15
1
1
15
1
12
13
15
15
15
15
15
15
15
15
15
15
6
6
6
6
6
6
6
6
1
1
1
1
1
12
12
13
15
15
15
15
15
15
15
15
15
15
15
15
6
6
6
6
6
6
6
1
1
1
1
12
13
15
15
15
15
15
15
9
8
15
15
8
15
6
6
6
6
6
1
1
1
1
11
15
15
15
15
15
15
15
15
8
8
8
7
7
8
8
6
6
6
6
6
1
1
1
11
11
15
15
15
15
15
15
8
8
8
8
7
7
8
8
8
6
6
6
6
6
1
1
1
individual tissue labels
(majority assignments)
suggested groups of tissues
normal samples: SOM, labelled post-hoc
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
AMALEA 2022 55
normal samples: SOM, labelled post-hoc
different tissues have different RP mRNA signatures:
if majority<50%
if majority <50%
AMALEA 2022
main findings: (only very few presented here, see NAR paper)
summary/conclusion
AMALEA 2022
RP mRNA signatures vary with
Ÿ tissue type Ÿ tumor type and sub-type Ÿ developmental stage
RP translation rates are proportional to RP mRNA levels
RP mRNA and profiling different in cell cultures vs. cells in-vivo
main findings: (only very few presented here, see NAR paper)
summary/conclusion
AMALEA 2022
RP mRNA signatures vary with
Ÿ tissue type Ÿ tumor type and sub-type Ÿ developmental stage
RP translation rates are proportional to RP mRNA levels
RP mRNA and profiling different in cell cultures vs. cells in-vivo
main findings: (only very few presented here, see NAR paper)
speculative (yet plausible) conclusions:
RP composition and function (?)
Ÿ is tissue-, tumor-, development-, environment-specific
Ÿ adds a novel layer to the regulatory network of the cell
Ÿ might play an important role in cancer
summary/conclusion
AMALEA 2022
RP mRNA signatures vary with
Ÿ tissue type Ÿ tumor type and sub-type Ÿ developmental stage
RP translation rates are proportional to RP mRNA levels
RP mRNA and profiling different in cell cultures vs. cells in-vivo
main findings: (only very few presented here, see NAR paper)
speculative (yet plausible) conclusions:
RP composition and function (?)
Ÿ is tissue-, tumor-, development-, environment-specific
Ÿ adds a novel layer to the regulatory network of the cell
Ÿ might play an important role in cancer
caveats: composition could be independent of RP abundance
possible extra-ribosomal functions of RP
direct inspection of ribosome is difficult
summary/conclusion
AMALEA 2022
supervised learning
classification / regression / prediction
based on labeled example data
AMALEA 2022
supervised learning
classification / regression / prediction
based on labeled example data
generic workflow:
example data model apply to novel data
training working
AMALEA 2022
supervised learning
classification / regression / prediction
based on labeled example data
generic workflow:
example data model apply to novel data
training working
validation
estimate working performance
set parameters of model / training
compare different models
AMALEA 2022
supervised learning
classification / regression / prediction
based on labeled example data
generic workflow:
example data model apply to novel data
training working
obvious performance measures: overall / class-wise accuracy
ROC, Precision Recall ...
validation
estimate working performance
set parameters of model / training
compare different models
AMALEA 2022
21
supervised learning
classification / regression / prediction
based on labeled example data
generic workflow:
example data model apply to novel data
training working
obvious performance measures: overall / class-wise accuracy
ROC, Precision Recall ...
validation
estimate working performance
set parameters of model / training
compare different models
accuracy is not enough - interpretable “white-box” systems
example: prototype-based models, distance-based classifiers
AMALEA 2022
distance-based classifiers
a simple distance-based system: NN classifier
store a set of labeled examples
N-dim. feature space
AMALEA 2022
distance-based classifiers
a simple distance-based system: NN classifier
store a set of labeled examples
classify a query according to the
label of the Nearest Neighbor
in the data set
?
N-dim. feature space
AMALEA 2022
distance-based classifiers
a simple distance-based system: NN classifier
store a set of labeled examples
classify a query according to the
label of the Nearest Neighbor
in the data set
piece-wise linear decision
boundaries according to e.g.
(squared) Euclidean distance:
?
N-dim. feature space
d(xµ
, x⌫
) =
N
X
j=1
xµ
j x⌫
j
2
AMALEA 2022
distance-based classifiers
a simple distance-based system: NN classifier
store a set of labeled examples
classify a query according to the
label of the Nearest Neighbor
in the data set
piece-wise linear decision
boundaries according to e.g.
(squared) Euclidean distance:
?
N-dim. feature space
d(xµ
, x⌫
) =
N
X
j=1
xµ
j x⌫
j
2
AMALEA 2022
distance-based classifiers
a simple distance-based system: NN classifier
store a set of labeled examples
classify a query according to the
label of the Nearest Neighbor
in the data set
piece-wise linear decision
boundaries according to e.g.
(squared) Euclidean distance:
?
N-dim. feature space
+ conceptually simple,
+ no training phase
- expensive (storage, computation)
- sensitive to mislabeled data
- overly complex decision boundaries
d(xµ
, x⌫
) =
N
X
j=1
xµ
j x⌫
j
2
AMALEA 2022
prototype based classification
a prototype based classifier [Kohonen 1990]
represent the data by one or
several prototypes per class
N-dim. feature space
AMALEA 2022
prototype based classification
a prototype based classifier [Kohonen 1990]
represent the data by one or
several prototypes per class
classify a query according to the
label of the nearest prototype
(or alternative schemes)
local decision boundaries
acc. to Euclidean distances
from the prototypes
piece-wise linear class borders
parameterized by prototypes
N-dim. feature space
AMALEA 2022
prototype based classification
a prototype based classifier [Kohonen 1990]
represent the data by one or
several prototypes per class
classify a query according to the
label of the nearest prototype
(or alternative schemes)
local decision boundaries
acc. to Euclidean distances
from the prototypes
piece-wise linear class borders
parameterized by prototypes
N-dim. feature space
+
less sensitive to outliers, lower storage needs,
little computational effort in the working phase
AMALEA 2022
prototype based classification
a prototype based classifier [Kohonen 1990]
represent the data by one or
several prototypes per class
classify a query according to the
label of the nearest prototype
(or alternative schemes)
local decision boundaries
acc. to Euclidean distances
from the prototypes
piece-wise linear class borders
parameterized by prototypes
N-dim. feature space
+
less sensitive to outliers, lower storage needs,
little computational effort in the working phase
-
training phase required in order to place prototypes,
model selection problem: number of prototypes per class etc.
AMALEA 2022
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
AMALEA 2022
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
competitive learning: heuristic LVQ1 [Kohonen, 1990]
AMALEA 2022
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors for each class
competitive learning: heuristic LVQ1 [Kohonen, 1990]
AMALEA 2022
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors for each class
competitive learning: heuristic LVQ1 [Kohonen, 1990]
• present a single example
AMALEA 2022
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors for each class
competitive learning: heuristic LVQ1 [Kohonen, 1990]
• identify the winner (closest prototype)
• present a single example
AMALEA 2022
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors for each class
competitive learning: heuristic LVQ1 [Kohonen, 1990]
• identify the winner (closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
AMALEA 2022
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors for each class
competitive learning: heuristic LVQ1 [Kohonen, 1990]
• identify the winner (closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
AMALEA 2022
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors for each class
competitive learning: heuristic LVQ1 [Kohonen, 1990]
• identify the winner (closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
AMALEA 2022
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors for each class
competitive learning: heuristic LVQ1 [Kohonen, 1990]
• identify the winner (closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
- away from the data (different class)
AMALEA 2022
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors for each class
competitive learning: heuristic LVQ1 [Kohonen, 1990]
• identify the winner (closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
- away from the data (different class)
• many variants, including
cost-function-based schemes, e.g.
Generalized LVQ (approximates # of misclassifications)
AMALEA 2022
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. squared Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
∙ tesselation of feature space
[piece-wise linear]
∙ distance-based classification
[here: Euclidean distances]
∙ aim: generalization ability
correct classification of new data
AMALEA 2022
LVQ distance measures
? key question: appropriate distance / (dis-) similarity measure
fixed, pre-defined distance measures:
(G)LVQ can formulated for general (differentiable) distances
AMALEA 2022
LVQ distance measures
? key question: appropriate distance / (dis-) similarity measure
fixed, pre-defined distance measures:
(G)LVQ can formulated for general (differentiable) distances
examples: Minkowski distances (p≠2), correlation based,
statistical divergences, ... not necessarily metrics!
AMALEA 2022
LVQ distance measures
? key question: appropriate distance / (dis-) similarity measure
fixed, pre-defined distance measures:
(G)LVQ can formulated for general (differentiable) distances
examples: Minkowski distances (p≠2), correlation based,
statistical divergences, ... not necessarily metrics!
standard work-flow
- consider several distance measures
- compare performances in, e.g., cross-validation
AMALEA 2022
LVQ distance measures
? key question: appropriate distance / (dis-) similarity measure
fixed, pre-defined distance measures:
(G)LVQ can formulated for general (differentiable) distances
examples: Minkowski distances (p≠2), correlation based,
statistical divergences, ... not necessarily metrics!
standard work-flow
- consider several distance measures
- compare performances in, e.g., cross-validation
elegant approach:
Relevance Learning / adaptive distances
- employ parameterized distance measure
- optimize in the data-driven training process (cost function!)
AMALEA 2022
Generalized Matrix Relevance LVQ
generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009]
d(w, x) = (w x)
>
⇤ (w x)
(GMLVQ)
AMALEA 2022
Generalized Matrix Relevance LVQ
generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009]
d(w, x) = (w x)
>
⇤ (w x)
(GMLVQ)
AMALEA 2022
Generalized Matrix Relevance LVQ
generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009]
d(w, x) = (w x)
>
⇤ (w x)
(GMLVQ)
= [ ⌦ (w x) ]
2
AMALEA 2022
GMLVQ
generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009]
training: adaptation of prototypes
and distance measure guided by
GLVQ cost function
= [ ⌦ (w x) ]
2
d(w, x) = (w x)
>
⇤ (w x)
Generalized Matrix Relevance LVQ:
AMALEA 2022
GMLVQ
generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009]
variants:
one global, several local, class-wise relevance matrices
rectangular low-dim. representation / visualization
[Bunte et al., 2012]
diagonal matrices: single feature weights [Hammer et al., 2002]
training: adaptation of prototypes
and distance measure guided by
GLVQ cost function
= [ ⌦ (w x) ]
2
d(w, x) = (w x)
>
⇤ (w x)
Generalized Matrix Relevance LVQ:
AMALEA 2022
But this is just Mahalonobis distance…
No.
AMALEA 2022
But this is just Mahalonobis distance…
[Mahalonobis, 1936]
S covariance matrix of random vectors
(calculated once from the data, fixed definition, not adaptive)
x 2 RN
(‘two point version’)
No.
dM (x, y) =
q
(x y)> S 1 (x y)
AMALEA 2022
But this is just Mahalonobis distance…
[Mahalonobis, 1936]
S covariance matrix of random vectors
(calculated once from the data, fixed definition, not adaptive)
x 2 RN
if you insist…
(‘two point version’)
So it is a generalized Mahalonobis distance ?
No.
dM (x, y) =
q
(x y)> S 1 (x y)
AMALEA 2022
But this is just Mahalonobis distance…
[Mahalonobis, 1936]
S covariance matrix of random vectors
(calculated once from the data, fixed definition, not adaptive)
x 2 RN
if you insist…
(‘two point version’)
So it is a generalized Mahalonobis distance ?
No.
a generalized
broccoli
E = ~!
a generalization
of Ohm’s Law
dM (x, y) =
q
(x y)> S 1 (x y)
AMALEA 2022 98
interpretation
after training:
prototypes represent typical class properties or subtypes (hope)
AMALEA 2022 99
interpretation
summarizes
• the contribution of a single dimension
• the relevance of original features in the classifier
⇤ij
quantifies the contribution of the pair
of features (i,j) to the distance
after training:
prototypes represent typical class properties or subtypes (hope)
Relevance Matrix
AMALEA 2022 100
interpretation
summarizes
• the contribution of a single dimension
• the relevance of original features in the classifier
Note: interpretation assumes implicitly that
features have equal order of magnitude
e.g. after z-score-transformation →
(averages over data set)
⇤ij
quantifies the contribution of the pair
of features (i,j) to the distance
after training:
prototypes represent typical class properties or subtypes (hope)
Relevance Matrix
Urine Steroid Metabolomics as a Biomarker Tool for
Detecting Malignancy in Patients with Adrenal Tumors
www.ensat.org
W. Arlt, M. Biehl, A. Taylor, S. Hahner, R. Libé, B. Hughes, P. Schneider,
D. Smith, H. Stiekema, N. Krone, E. Porfiri, G. Opocher, J. Bertherat,
F. Mantero, B. Allolio, M. Terzolo, P. Nightingale, C. Shackleton,
X. Bertagna, M.Fassnacht, P. Stewart
J Clinical Endocrinology & Metabolism 96: 3775-3784 (2011)
tumor classification
Urine Steroid Metabolomics as a Biomarker Tool for
Detecting Malignancy in Patients with Adrenal Tumors
www.ensat.org
W. Arlt, M. Biehl, A. Taylor, S. Hahner, R. Libé, B. Hughes, P. Schneider,
D. Smith, H. Stiekema, N. Krone, E. Porfiri, G. Opocher, J. Bertherat,
F. Mantero, B. Allolio, M. Terzolo, P. Nightingale, C. Shackleton,
X. Bertagna, M.Fassnacht, P. Stewart
J Clinical Endocrinology & Metabolism 96: 3775-3784 (2011)
tumor classification
insight: marker selection, patented diagnosis tool
follow-up: recurrence detection, other disorders, tumor sub-types...
AMALEA 2022 103
two recent application examples
AMALEA 2022 104
two recent application examples
I) cytokine expression data:
- insights into disease mechanisms of (early) rheumatoid arthritis
based on synovial tissue samples
~ 50 samples represented by 117 cytokine expressions
in synovial tissue, PCA+GMLVQ combined
AMALEA 2022 105
two recent application examples
I) cytokine expression data:
- insights into disease mechanisms of (early) rheumatoid arthritis
based on synovial tissue samples
~ 50 samples represented by 117 cytokine expressions
in synovial tissue, PCA+GMLVQ combined
II) FDG-PET brain scans:
- ultimate goal: diagnosis of neurodegenerative diseases
~ 100 samples, ~200000 voxels per scan
SSM/PCA+GMLVQ combined
Early diagnosis (?) of Rheumatoid Arthritis
Expression of chemokines CXCL4 and CXCL7 by synovial
macrophages defines an early stage of rheumatoid arthritis
Annals of the Rheumatic Diseases 75:763-771 (2016)
L. Yeo, N. Adlard, M. Biehl, M. Juarez, M. Snow
C.D. Buckley, A. Filer, K. Raza, D. Scheel-Toellner
AMALEA 2022 34
Rheumatoid Arthritis
Rheumatoid Arthritis (RA)
- chronicle inflammatory disease
- immune system affects joints
- RA leads to deformation and disability
AMALEA 2022
uninflamed control established RA early inflammation
rheumatoid arthritis (RA)
AMALEA 2022
uninflamed control established RA early inflammation
resolving early RA
rheumatoid arthritis (RA)
AMALEA 2022
uninflamed control established RA early inflammation
resolving early RA
ultimate goals:
understand pathogenesis and
mechanism of progression
rheumatoid arthritis (RA)
AMALEA 2022
uninflamed control established RA early inflammation
resolving early RA
cytokine based diagnosis of RA
at earliest possible stage ?
ultimate goals:
understand pathogenesis and
mechanism of progression
rheumatoid arthritis (RA)
AMALEA 2022
mRNA extraction real-time PCR
tissue section
synovium
synovial tissue cytokine expression
AMALEA 2022
mRNA extraction real-time PCR
tissue section
synovium
synovial tissue cytokine expression
IL1A IL17F FASL CXCL4 CCL15 TGFB1 KITLG
IL1B IL18 CD70 CXCL5 CCL16 TGFB2 MST1
IL1RN IL19 CD30L CXCL6 CCL17 TGFB3 SPP1
IL2 IL20 4-1BB-L CXCL7 CCL18 EGF SFRP1
IL3 IL21 TRAIL CXCL9 CCL19 FGF2 ANXA1
IL4 IL22 RANKL CXCL10 CCL20 TGFA TNFRSF13B
IL5 IL23A TWEAK CXCL11 CCL21 IGF2 IL6R
IL6 IL24 APRIL CXCL12 CCL22 VEGFA NAMPT
IL7 IL25 BAFF CXCL13 CCL23 VEGFB C1QTNF3
IL8 IL26 LIGHT CXCL14 CCL24 MIF VCAM1
IL9 IL27 TL1A CXCL16 CCL25 LIF LGALS1
IL10 IL28A GITRL CCL1 CCL26 OSM LGALS9
IL11 IL29 FASLG CCL2 CCL27 ADIPOQ LGALS3
IL12A IL32 IFNA1 CCL3 CCL28 LEP LGALS12
IL12B IL33 IFNA2 CCL4 XCL1 GHRL
IL13 LTA IFNB1 CCL5 XCL2 RETN
IL14 TNF IFNG CCL7 CX3CL1 CTLA4
IL15 LTB CXCL1 CCL8 CSF1 EPO
IL16 OX40L CXCL2 CCL11 CSF2 TPO
IL17A CD40L CXCL3 CCL13 CSF3 FLT3LG
panel of 117 cytokines
• cell signaling proteins
• regulate immune response
• produced by, e.g.
T-cells, macrophages,
lymphocytes, fibroblasts, etc.
AMALEA 2022
GMLVQ analysis
pre-processing
• log-transformed expression values
• 21 leading principal components explain 95% of the variation
x 2 I
R117
, x = e
x 2 I
R21
AMALEA 2022
GMLVQ analysis
pre-processing
• log-transformed expression values
• 21 leading principal components explain 95% of the variation
Two binary problems: (A) established RA vs. uninflamed controls (!)
(B) early RA vs. resolving inflammation (")
• 1 prototype per class, global relevance matrix, distance measure:
x 2 I
R117
, x = e
x 2 I
R21
d(e
x, e
w) = (e
x e
w)
> e
⇤ (e
x e
w) = (x w)
> > e
⇤
| {z }
⇤
(x w)
AMALEA 2022
GMLVQ analysis
pre-processing
• log-transformed expression values
• 21 leading principal components explain 95% of the variation
Two binary problems: (A) established RA vs. uninflamed controls (!)
(B) early RA vs. resolving inflammation (")
• 1 prototype per class, global relevance matrix, distance measure:
x 2 I
R117
, x = e
x 2 I
R21
d(e
x, e
w) = (e
x e
w)
> e
⇤ (e
x e
w) = (x w)
> > e
⇤
| {z }
⇤
(x w)
d(e
x, e
w) = (e
x e
w)
> e
⇤ (e
x e
w) = (x w)
> > e
⇤
| {z }
⇤
(x w)
in
<latexit sha1_base64="S6bznRCRGpJCx80D30CfTaQNebg=">AAAB83icbVDLSsNAFL2pr1pfUZduBovgqiSi6M6CG1dSxT6giWUynbRDJ5MwMxFK6G8I6kIRt36Av+HOv3HSdqGtBwYO59zLPXOChDOlHefbKiwsLi2vFFdLa+sbm1v29k5DxakktE5iHstWgBXlTNC6ZprTViIpjgJOm8HgIveb91QqFotbPUyoH+GeYCEjWBvJ8yKs+0GQ3Yzurjp22ak4Y6B54k5J+fzzMcdTrWN/ed2YpBEVmnCsVNt1Eu1nWGpGOB2VvFTRBJMB7tG2oQJHVPnZOPMIHRili8JYmic0Gqu/NzIcKTWMAjOZZ1SzXi7+57VTHZ75GRNJqqkgk0NhypGOUV4A6jJJieZDQzCRzGRFpI8lJtrUVDIluLNfnieNo4p7UnGunXL1GCYowh7swyG4cApVuIQa1IFAAg/wAq9Waj1bb9b7ZLRgTXd24Q+sjx8yWJYz</latexit>
RN
AMALEA 2022
GMLVQ analysis
pre-processing
• log-transformed expression values
• 21 leading principal components explain 95% of the variation
Two binary problems: (A) established RA vs. uninflamed controls (!)
(B) early RA vs. resolving inflammation (")
• 1 prototype per class, global relevance matrix, distance measure:
• leave-two-out validation (one from each class)
evaluation in terms of Receiver Operating Characteristics
x 2 I
R117
, x = e
x 2 I
R21
d(e
x, e
w) = (e
x e
w)
> e
⇤ (e
x e
w) = (x w)
> > e
⇤
| {z }
⇤
(x w)
d(e
x, e
w) = (e
x e
w)
> e
⇤ (e
x e
w) = (x w)
> > e
⇤
| {z }
⇤
(x w)
in
<latexit sha1_base64="S6bznRCRGpJCx80D30CfTaQNebg=">AAAB83icbVDLSsNAFL2pr1pfUZduBovgqiSi6M6CG1dSxT6giWUynbRDJ5MwMxFK6G8I6kIRt36Av+HOv3HSdqGtBwYO59zLPXOChDOlHefbKiwsLi2vFFdLa+sbm1v29k5DxakktE5iHstWgBXlTNC6ZprTViIpjgJOm8HgIveb91QqFotbPUyoH+GeYCEjWBvJ8yKs+0GQ3Yzurjp22ak4Y6B54k5J+fzzMcdTrWN/ed2YpBEVmnCsVNt1Eu1nWGpGOB2VvFTRBJMB7tG2oQJHVPnZOPMIHRili8JYmic0Gqu/NzIcKTWMAjOZZ1SzXi7+57VTHZ75GRNJqqkgk0NhypGOUV4A6jJJieZDQzCRzGRFpI8lJtrUVDIluLNfnieNo4p7UnGunXL1GCYowh7swyG4cApVuIQa1IFAAg/wAq9Waj1bb9b7ZLRgTXd24Q+sjx8yWJYz</latexit>
RN
AMALEA 2022
false positive rate
true
positive
rate
diagonal Λii vs. cytokine index i
(A) established RA vs.
uninflamed control
Relevances
diagonal relevances
leave-one-out
AMALEA 2022
false positive rate
true
positive
rate
t
rue
positive
rate
diagonal Λii vs. cytokine index i
(A) established RA vs.
uninflamed control
(B) early RA vs.
resolving inflammation
Relevances
diagonal relevances
leave-one-out
AMALEA 2022
false positive rate
true
positive
rate
t
rue
positive
rate
diagonal Λii vs. cytokine index i
(A) established RA vs.
uninflamed control
(B) early RA vs.
resolving inflammation
Relevances
diagonal relevances
leave-one-out
AMALEA 2022
CXCL4 chemokine (C-X-C motif) ligand 4
CXCL7 chemokine (C-X-C motif) ligand 7
protein level studies
AMALEA 2022
CXCL4 chemokine (C-X-C motif) ligand 4
CXCL7 chemokine (C-X-C motif) ligand 7
direct study on protein level, staining / imaging of sinovial tissue:
macrophages : predominant source of CXCL4/7 expression
protein level studies
AMALEA 2022
CXCL4 chemokine (C-X-C motif) ligand 4
CXCL7 chemokine (C-X-C motif) ligand 7
direct study on protein level, staining / imaging of sinovial tissue:
macrophages : predominant source of CXCL4/7 expression
protein level studies
• high levels of CXCL4 and
CXLC7 in early RA
• expression on macrophages
outside of blood vessels
discriminates
early RA / resolving cases
AMALEA 2022
false positive rate
true
positive
rate
t
rue
positive
rate
diagonal Λii vs. cytokine index i
(A) established RA vs.
uninflamed control
(B) early RA vs.
resolving inflammation
relevant cytokines
macrophage
stimulating 1
diagonal relevances
leave-one-out
Machine learning analysis of FDG-PET
brain images for the diagnosis of
neurodegenerative disorders
K.L. Leenders, S. Meles, … UMCG Groningen, Neurology
R. van Veen, S. Lövdal Bernoulli Institute, Computer Science
…
AMALEA 2022 42
Glucose
uptake
http://glimpsproject.com
subjects
A
B
C
FDG-PET 3D images
Fluorodeoxyglucose
positron emission tomography
data
AMALEA 2022 42
Glucose
uptake
http://glimpsproject.com
subjects
A
B
C
FDG-PET 3D images
Fluorodeoxyglucose
positron emission tomography
Healthy Controls HC
Parkinson’s Disease PD
Alzheimer’s Disease AD
data
AMALEA 2022 42
Subjects
Source HC PD AD
CUN 19 49 -
UGOSM 44 58 55
UMCG 19 20 21
FDG-PET brain scans from 3 centers
• Clínica Universidad de Navarra
• Univ. Genova/IRCCS San Martino
• Univ. Medical Center Groningen
Glucose
uptake
http://glimpsproject.com
subjects
A
B
C
FDG-PET 3D images
Fluorodeoxyglucose
positron emission tomography
Healthy Controls HC
Parkinson’s Disease PD
Alzheimer’s Disease AD
data
AMALEA 2022 43
work flow
subjects
~
200000
voxels
AMALEA 2022 43
work flow
subjects
~
200000
voxels
subject specific
anatomy
high intensity,
low noise voxels
log-transform
double centering
masking (*)
low-dimensional
projections (*) details of pre-processing:
D. Mudali et al.
Computational and Mathematical Methods in Medicine.
March 2015, Art.ID 136921 and references. therein
(*) Scaled Subprofile Model / PCA based
on a disjoint reference group of subjects
AMALEA 2022 43
work flow
subjects
~
200000
voxels
subject specific
anatomy
high intensity,
low noise voxels
log-transform
double centering
masking (*)
low-dimensional
projections (*)
subject
socres
subjects
details of pre-processing:
D. Mudali et al.
Computational and Mathematical Methods in Medicine.
March 2015, Art.ID 136921 and references. therein
(*) Scaled Subprofile Model / PCA based
on a disjoint reference group of subjects
AMALEA 2022 44
work flow
subjects
subject
socres
subjects
labels
(condition)
classification:
GMLVQ, SVM
~
200000
voxels
AMALEA 2022 44
work flow
subjects
subject
socres
subjects
applied to
novel subject
test
labels
(condition)
classification:
GMLVQ, SVM
?
~
200000
voxels
AMALEA 2022 44
work flow
subjects
subject
socres
subjects
applied to
novel subject
test
labels
(condition)
classification:
GMLVQ, SVM
?
~
200000
voxels
AMALEA 2022 45
(A) Perceptron of optimal stability (aka “SVM with linear kernel”)
- linear threshold classifier
- large margin (with errors)
two classifiers
AMALEA 2022 45
(A) Perceptron of optimal stability (aka “SVM with linear kernel”)
- linear threshold classifier
- large margin (with errors)
two classifiers
(B) Learning Vector Quantization (Generalized Matrix LVQ)
- prototype- and distance-based classifier
- relevance learning
AMALEA 2022 45
(A) Perceptron of optimal stability (aka “SVM with linear kernel”)
- linear threshold classifier
- large margin (with errors)
performance evaluation:
averages over 10 randomized runs of 10-fold cross-validation
accuracies, sensitivity /specificity
Receiver Operating Characteristics for binary classification
both classifiers outperformed Decision Trees in previous projects
two classifiers
(B) Learning Vector Quantization (Generalized Matrix LVQ)
- prototype- and distance-based classifier
- relevance learning
AMALEA 2022 46
results
subjects from one center only
here: UGOSM unbiased classifiers ROC
±0.008
AMALEA 2022 46
results
subjects from one center only
here: UGOSM unbiased classifiers ROC
relatively good within-center performance
also in three-class settings
±0.008
AMALEA 2022 47
prototypes back-projected
AMALEA 2022 48
results
subjects from one center only
here: UGOSM, PD vs. AD unbiased classifiers ROC
AMALEA 2022 48
results
subjects from one center only
here: UGOSM, PD vs. AD unbiased classifiers ROC
PD vs. AD
subjects from centers combined for training and testing
example: UMCG and UGOSM
AMALEA 2022 48
results
subjects from one center only
here: UGOSM, PD vs. AD unbiased classifiers ROC
PD vs. AD
subjects from centers combined for training and testing
example: UMCG and UGOSM
reasonable (yet lower) overall performance
also in the other classification problems
AMALEA 2022 49
here: PD vs. HC
unbiased classifiers ROC
within center
(example: UGMOS)
results
AMALEA 2022 49
here: PD vs. HC
unbiased classifiers ROC
within center
(example: UGMOS)
across centers: poor performance
results
AMALEA 2022
50
UMCG vs UGOSM
experiment - classify subjects according to medical center
here: AD patients only
results/conclusions
AMALEA 2022
50
UMCG vs UGOSM
experiment - classify subjects according to medical center
here: AD patients only
results/conclusions
possible explanations:
- center-specific (pre-)processing
despite identical equipment and work flows
- significantly different patient cohorts (not the case)
need for more consistent protocols, calibration / pre-processing
aim: unified classifiers with good inter-center performance
AMALEA 2022 51
Matlab:
K Bunte: Relevance and Matrix adaptation in Learning Vector
Quantization (GRLVQ, GMLVQ and LiRaM LVQ)
-> code
F Westerman, R Veen, M.B: A no-nonsense beginners’ tool for GMLVQ
http://www.cs.rug.nl/~biehl/gmlvq
sklvq: Scikit Learning Vector Quantization
R van Veen, GJ de Vries, M. Biehl, JMLR 22 (2021), 1-6
https://www.cs.rug.nl/~biehl/
CITEC Bielefeld: scikit-learn compatible LVQ implementations
from the machine learning group at CITEC Bielefeld:
Java:
plug-in for WEKA from the CI Group Mittweida
M. Kästner, T. Villmann
Python:

More Related Content

Similar to prototypes-AMALEA.pdf

Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) Algorithm
Mostafa G. M. Mostafa
 
Sefl Organizing Map
Sefl Organizing MapSefl Organizing Map
Sefl Organizing Map
Nguyen Van Chuc
 
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
Paris Women in Machine Learning and Data Science
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
yang947066
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
Anshika865276
 
January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning  January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning
University of Groningen
 
Paper id 24201464
Paper id 24201464Paper id 24201464
Paper id 24201464IJRAT
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learning2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learning
University of Groningen
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
Krzysztof Kowalczyk
 
Analysis of intelligent system design by neuro adaptive control no restriction
Analysis of intelligent system design by neuro adaptive control no restrictionAnalysis of intelligent system design by neuro adaptive control no restriction
Analysis of intelligent system design by neuro adaptive control no restrictioniaemedu
 
Analysis of intelligent system design by neuro adaptive control
Analysis of intelligent system design by neuro adaptive controlAnalysis of intelligent system design by neuro adaptive control
Analysis of intelligent system design by neuro adaptive controliaemedu
 
An introduction to deep learning
An introduction to deep learningAn introduction to deep learning
An introduction to deep learning
Van Thanh
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
Waqas Tariq
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
CSCJournals
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learningbutest
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
shesnasuneer
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
shesnasuneer
 
Paper id 21201483
Paper id 21201483Paper id 21201483
Paper id 21201483IJRAT
 
Projection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamicsProjection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamics
University of Glasgow
 

Similar to prototypes-AMALEA.pdf (20)

Neural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) AlgorithmNeural Networks: Least Mean Square (LSM) Algorithm
Neural Networks: Least Mean Square (LSM) Algorithm
 
Sefl Organizing Map
Sefl Organizing MapSefl Organizing Map
Sefl Organizing Map
 
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 
January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning  January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning
 
Paper id 24201464
Paper id 24201464Paper id 24201464
Paper id 24201464
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learning2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learning
 
Camp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine LearningCamp IT: Making the World More Efficient Using AI & Machine Learning
Camp IT: Making the World More Efficient Using AI & Machine Learning
 
Analysis of intelligent system design by neuro adaptive control no restriction
Analysis of intelligent system design by neuro adaptive control no restrictionAnalysis of intelligent system design by neuro adaptive control no restriction
Analysis of intelligent system design by neuro adaptive control no restriction
 
Analysis of intelligent system design by neuro adaptive control
Analysis of intelligent system design by neuro adaptive controlAnalysis of intelligent system design by neuro adaptive control
Analysis of intelligent system design by neuro adaptive control
 
An introduction to deep learning
An introduction to deep learningAn introduction to deep learning
An introduction to deep learning
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learning
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
Paper id 21201483
Paper id 21201483Paper id 21201483
Paper id 21201483
 
Projection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamicsProjection methods for stochastic structural dynamics
Projection methods for stochastic structural dynamics
 

More from University of Groningen

Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
University of Groningen
 
ESE-Eyes-2023.pdf
ESE-Eyes-2023.pdfESE-Eyes-2023.pdf
ESE-Eyes-2023.pdf
University of Groningen
 
APPIS-FDGPET.pdf
APPIS-FDGPET.pdfAPPIS-FDGPET.pdf
APPIS-FDGPET.pdf
University of Groningen
 
stat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdfstat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdf
University of Groningen
 
stat-phys-AMALEA.pdf
stat-phys-AMALEA.pdfstat-phys-AMALEA.pdf
stat-phys-AMALEA.pdf
University of Groningen
 
Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...
University of Groningen
 
The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...
University of Groningen
 
Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)
University of Groningen
 
Biehl hanze-2021
Biehl hanze-2021Biehl hanze-2021
Biehl hanze-2021
University of Groningen
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...
University of Groningen
 
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
University of Groningen
 
2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... 2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ...
University of Groningen
 
Prototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciencesPrototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciences
University of Groningen
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
University of Groningen
 
The statistical physics of learning - revisited
The statistical physics of learning - revisitedThe statistical physics of learning - revisited
The statistical physics of learning - revisited
University of Groningen
 
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
University of Groningen
 
2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification
University of Groningen
 
2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...
University of Groningen
 
2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data
University of Groningen
 
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
University of Groningen
 

More from University of Groningen (20)

Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
 
ESE-Eyes-2023.pdf
ESE-Eyes-2023.pdfESE-Eyes-2023.pdf
ESE-Eyes-2023.pdf
 
APPIS-FDGPET.pdf
APPIS-FDGPET.pdfAPPIS-FDGPET.pdf
APPIS-FDGPET.pdf
 
stat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdfstat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdf
 
stat-phys-AMALEA.pdf
stat-phys-AMALEA.pdfstat-phys-AMALEA.pdf
stat-phys-AMALEA.pdf
 
Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...
 
The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...
 
Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)
 
Biehl hanze-2021
Biehl hanze-2021Biehl hanze-2021
Biehl hanze-2021
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...
 
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
 
2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... 2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ...
 
Prototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciencesPrototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciences
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
The statistical physics of learning - revisited
The statistical physics of learning - revisitedThe statistical physics of learning - revisited
The statistical physics of learning - revisited
 
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
 
2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification
 
2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...
 
2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data
 
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
 

Recently uploaded

Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
anitaento25
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
muralinath2
 

Recently uploaded (20)

Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
insect taxonomy importance systematics and classification
insect taxonomy importance systematics and classificationinsect taxonomy importance systematics and classification
insect taxonomy importance systematics and classification
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Hemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptxHemostasis_importance& clinical significance.pptx
Hemostasis_importance& clinical significance.pptx
 

prototypes-AMALEA.pdf

  • 1. Michael Biehl Bernoulli Institute for Mathematics, Computer Science and Artificial Intelligence University of Groningen The Netherlands www.cs.rug.nl/~biehl Prototype-based machine learning: bio-medical applications
  • 2. AMALEA 2022 2 review: WIRES Cognitive Science (2016)
  • 4. AMALEA 2022 4 overview 1. Introduction / Motivation prototypes and exemplars, neural activation / learning
  • 5. AMALEA 2022 4 overview 1. Introduction / Motivation prototypes and exemplars, neural activation / learning 2. Unsupervised Learning Competitive Learning Kohonen’s Self-Organizing Map (SOM)
  • 6. AMALEA 2022 4 overview 1. Introduction / Motivation prototypes and exemplars, neural activation / learning 3. Supervised Learning Learning Vector Quantization (LVQ) Adaptive distances and Relevance Learning 2. Unsupervised Learning Competitive Learning Kohonen’s Self-Organizing Map (SOM)
  • 7. AMALEA 2022 4 overview 1. Introduction / Motivation prototypes and exemplars, neural activation / learning 3. Supervised Learning Learning Vector Quantization (LVQ) Adaptive distances and Relevance Learning 2. Unsupervised Learning Competitive Learning Kohonen’s Self-Organizing Map (SOM) Examples and illustrations : Bio-medical applications - clustering of proteomics data - biomarkers for rheumatoid arthritis (- FDG-Pet brain scans)
  • 8. AMALEA 2022 5 1. Introduction prototypes, exemplars: representation of information in terms of typical representatives (e.g. of a class of objects), much debated concept in cognitive psychology
  • 9. AMALEA 2022 5 1. Introduction prototypes, exemplars: representation of information in terms of typical representatives (e.g. of a class of objects), much debated concept in cognitive psychology
  • 10. AMALEA 2022 5 1. Introduction prototypes, exemplars: representation of information in terms of typical representatives (e.g. of a class of objects), much debated concept in cognitive psychology
  • 11. AMALEA 2022 5 1. Introduction prototypes, exemplars: representation of information in terms of typical representatives (e.g. of a class of objects), much debated concept in cognitive psychology neural activation: external stimulus to a network of neurons response acc. to weights (expected inputs)
  • 12. AMALEA 2022 5 1. Introduction prototypes, exemplars: representation of information in terms of typical representatives (e.g. of a class of objects), much debated concept in cognitive psychology neural activation: external stimulus to a network of neurons response acc. to weights (expected inputs) best matching unit (and neighbors) weights represent different expected stimuli (prototypes)
  • 13. AMALEA 2022 5 1. Introduction prototypes, exemplars: representation of information in terms of typical representatives (e.g. of a class of objects), much debated concept in cognitive psychology neural activation: external stimulus to a network of neurons response acc. to weights (expected inputs) best matching unit (and neighbors) weights represent different expected stimuli (prototypes) learning: change of weights result in even stronger response to similar stimuli in the future
  • 14. AMALEA 2022 6 even independent from the above: attractive framework for machine learning based data analysis - trained system is parameterized in the feature space - facilitates discussions with domain experts - transparent (white box) and provides insights into the applied criteria (classification, regression, clustering etc.) - easy to implement, efficient computation - versatile, successfully applied in many different application areas
  • 15. AMALEA 2022 7 2. Unsupervised Learning Potential aims: dimension reduction: compression, visualization, ... exploration of data structure: clustering, density estimation, ... pre-processing: supvervised learning, classification, regression,...
  • 16. AMALEA 2022 7 2. Unsupervised Learning Potential aims: dimension reduction: compression, visualization, ... exploration of data structure: clustering, density estimation, ... pre-processing: supvervised learning, classification, regression,... Vector Quantization: identify (few) typical representatives from a set of feature vectors w1 , w2 , . . . , wK wk 2 I RN x1 , x2 , . . . , xP xµ 2 I RN
  • 17. AMALEA 2022 7 2. Unsupervised Learning Potential aims: dimension reduction: compression, visualization, ... exploration of data structure: clustering, density estimation, ... pre-processing: supvervised learning, classification, regression,... Vector Quantization: identify (few) typical representatives from a set of feature vectors w1 , w2 , . . . , wK wk 2 I RN x1 , x2 , . . . , xP xµ 2 I RN assign xµ to winning prototype w⇤ = argminj d(wj , xµ ) d(w, x) = N X n=1 (wn xn) 2 for instance w.r.t. : squared Euclidean distance
  • 18. AMALEA 2022 8 , random sequence of single data: … the winner takes it all: initially: randomized wk competitive learning competition for updates learning rate / step size η <1 ⌘ (xµ w⇤ ) w⇤ ! w⇤ + ⌘ (xµ w⇤ )
  • 19. AMALEA 2022 8 , random sequence of single data: … the winner takes it all: initially: randomized wk competitive learning competition for updates learning rate / step size η <1 ⌘ (xµ w⇤ ) w⇤ ! w⇤ + ⌘ (xµ w⇤ )
  • 20. AMALEA 2022 8 , random sequence of single data: … the winner takes it all: initially: randomized wk competitive learning competition for updates learning rate / step size η <1 ⌘ (xµ w⇤ ) w⇤ ! w⇤ + ⌘ (xµ w⇤ ) competitive VQ = stochastic gradient descent w.r.t. Quantization Error - assign each data to closest prototype - measure the corresponding distance (e.g. squared Euclidean) - sum over all assigned data points measures the quality of the representation defines a (one possible) criterion to evaluate / compare the quality of different prototype configurations { QE
  • 24. AMALEA 2022 9 data initial prototypes dead units WTA training general problem: local minima of the quantization error, initialization-dependent outcome of training competitive learning improvement: rank-based updates (winner, second, third,… )
  • 25. AMALEA 2022 9 data initial prototypes dead units WTA training general problem: local minima of the quantization error, initialization-dependent outcome of training competitive learning improvement: rank-based updates (winner, second, third,… ) introduce rank-based neighborhood cooperativeness [Martinetz, Berkovich, Schulten, IEEE Trans. Neural Netw. 1993] Neural Gas: many prototypes to represent the density of data
  • 26. AMALEA 2022 Self-Organizing Map T. Kohonen. Self-Organizing Maps. Springer (1995) neighborhood cooperativeness on a predefined low-dim. lattice
  • 27. AMALEA 2022 Self-Organizing Map T. Kohonen. Self-Organizing Maps. Springer (1995) neighborhood cooperativeness on a predefined low-dim. lattice lattice A of neurons i.e. prototypes wr 2 I RN at r 2 I Rd A
  • 28. AMALEA 2022 Self-Organizing Map T. Kohonen. Self-Organizing Maps. Springer (1995) neighborhood cooperativeness on a predefined low-dim. lattice lattice A of neurons i.e. prototypes ws wr 2 I RN at r 2 I Rd upon presentation of xµ : - determine the winner (best matching unit) in feature space: (at position s in A) A
  • 29. AMALEA 2022 Self-Organizing Map T. Kohonen. Self-Organizing Maps. Springer (1995) neighborhood cooperativeness on a predefined low-dim. lattice lattice A of neurons i.e. prototypes - update winner and lattice neighborhood: where range ρ w.r.t. distances in lattice A ws wr 2 I RN at r 2 I Rd h⇢(r, s) = exp ✓ || r s ||2 A 2⇢2 ◆ upon presentation of xµ : - determine the winner (best matching unit) in feature space: (at position s in A) <latexit sha1_base64="P/AKHBol8ZLEa4RgJDSz5KstNjs=">AAACY3icbVHLSgMxFM2MWrU+Oj52KgRFaLGWGVF0KbhxWcFaoVOHTJrphGYeJBm1DrP0i9z6Fe7EjRv/w0wfUqsXkns491zuzYkbMyqkab5r+szsXGF+YbG4tLyyWjLW1m9ElHBMGjhiEb91kSCMhqQhqWTkNuYEBS4jTbd3kdeb94QLGoXXsh+TdoC6IfUoRlJRjvFkB0j6rpc+ZA6HtozgiIAjpgoP8ssmEuXZd2zuR+VxF8+qYyiyihLkUpd2fwSP2Z0dJPAQTs7JFRXoGHtmzRwE/AusEdg73/aeP+5fXuuO8WZ3IpwEJJSYISFalhnLdoq4pJiRrGgngsQI91CXtBQMUUBEOx14lMF9xXSgF3F1QgkH7GRHigIh+oGrlPmmYrqWk//VWon0ztopDeNEkhAPB3kJg8rK3HDYoZxgyfoKIMyp2hViH3GEpfqWojLBmn7yX3BzVLNOauaVcuMYDGMBbIFdUAYWOAXn4BLUQQNg8KkVtJJmaF/6kr6ubw6lujbq2QC/Qt/5BtYMuc8=</latexit> wr ! wr + ⌘ h⇢(r, s) xµ wr A
  • 30. AMALEA 2022 11 - lattice deforms in , reflecting the density of observations © Wikipedia Self-Organizing Map <latexit sha1_base64="S6bznRCRGpJCx80D30CfTaQNebg=">AAAB83icbVDLSsNAFL2pr1pfUZduBovgqiSi6M6CG1dSxT6giWUynbRDJ5MwMxFK6G8I6kIRt36Av+HOv3HSdqGtBwYO59zLPXOChDOlHefbKiwsLi2vFFdLa+sbm1v29k5DxakktE5iHstWgBXlTNC6ZprTViIpjgJOm8HgIveb91QqFotbPUyoH+GeYCEjWBvJ8yKs+0GQ3Yzurjp22ak4Y6B54k5J+fzzMcdTrWN/ed2YpBEVmnCsVNt1Eu1nWGpGOB2VvFTRBJMB7tG2oQJHVPnZOPMIHRili8JYmic0Gqu/NzIcKTWMAjOZZ1SzXi7+57VTHZ75GRNJqqkgk0NhypGOUV4A6jJJieZDQzCRzGRFpI8lJtrUVDIluLNfnieNo4p7UnGunXL1GCYowh7swyG4cApVuIQa1IFAAg/wAq9Waj1bb9b7ZLRgTXd24Q+sjx8yWJYz</latexit> RN
  • 31. AMALEA 2022 11 - lattice deforms in , reflecting the density of observations © Wikipedia SOM provides topology preserving low-dim representation for inspection and visualization of structured datasets Self-Organizing Map <latexit sha1_base64="S6bznRCRGpJCx80D30CfTaQNebg=">AAAB83icbVDLSsNAFL2pr1pfUZduBovgqiSi6M6CG1dSxT6giWUynbRDJ5MwMxFK6G8I6kIRt36Av+HOv3HSdqGtBwYO59zLPXOChDOlHefbKiwsLi2vFFdLa+sbm1v29k5DxakktE5iHstWgBXlTNC6ZprTViIpjgJOm8HgIveb91QqFotbPUyoH+GeYCEjWBvJ8yKs+0GQ3Yzurjp22ak4Y6B54k5J+fzzMcdTrWN/ed2YpBEVmnCsVNt1Eu1nWGpGOB2VvFTRBJMB7tG2oQJHVPnZOPMIHRili8JYmic0Gqu/NzIcKTWMAjOZZ1SzXi7+57VTHZ75GRNJqqkgk0NhypGOUV4A6jJJieZDQzCRzGRFpI8lJtrUVDIluLNfnieNo4p7UnGunXL1GCYowh7swyG4cApVuIQa1IFAAg/wAq9Waj1bb9b7ZLRgTXd24Q+sjx8yWJYz</latexit> RN
  • 32. AMALEA 2022 11 - lattice deforms in , reflecting the density of observations © Wikipedia SOM provides topology preserving low-dim representation for inspection and visualization of structured datasets Self-Organizing Map e.g. SOM toolbox (Matlab) Helsinki University of Technology http://www.cis.hut.fi/somtoolbox/ - select lattice type, size and shape (PCA-based defaults) - select training prescription, training time etc. - compute quantization error etc. - visualize the SOM grid, distances between prototypes ... <latexit sha1_base64="S6bznRCRGpJCx80D30CfTaQNebg=">AAAB83icbVDLSsNAFL2pr1pfUZduBovgqiSi6M6CG1dSxT6giWUynbRDJ5MwMxFK6G8I6kIRt36Av+HOv3HSdqGtBwYO59zLPXOChDOlHefbKiwsLi2vFFdLa+sbm1v29k5DxakktE5iHstWgBXlTNC6ZprTViIpjgJOm8HgIveb91QqFotbPUyoH+GeYCEjWBvJ8yKs+0GQ3Yzurjp22ak4Y6B54k5J+fzzMcdTrWN/ed2YpBEVmnCsVNt1Eu1nWGpGOB2VvFTRBJMB7tG2oQJHVPnZOPMIHRili8JYmic0Gqu/NzIcKTWMAjOZZ1SzXi7+57VTHZ75GRNJqqkgk0NhypGOUV4A6jJJieZDQzCRzGRFpI8lJtrUVDIluLNfnieNo4p7UnGunXL1GCYowh7swyG4cApVuIQa1IFAAg/wAq9Waj1bb9b7ZLRgTXd24Q+sjx8yWJYz</latexit> RN
  • 33. AMALEA 2022 12 - many extensions of the basic concept, e.g. cost function based SOM [Heskes] Generative Topographic Map (GTM), probabilistic formulation of the mapping to low-dim. lattice [Bishop, Svensen, Williams, 1998] specific modifications of SOM or Neural Gas for - time series / functional data - “non-vectorial” relational data - graphs and trees - supervised learning Remarks
  • 34. AMALEA 2022 13 example application
  • 35. AMALEA 2022 the “central dogma” of molecular biology
  • 36. AMALEA 2022 transcription: DNA è (m)RNA the “central dogma” of molecular biology
  • 37. AMALEA 2022 transcription: DNA è (m)RNA translation mRNA è proteins the “central dogma” of molecular biology
  • 38. AMALEA 2022 transcription: DNA è (m)RNA translation mRNA è proteins the “central dogma” of molecular biology protein,è function
  • 39. AMALEA 2022 transcription: DNA è (m)RNA translation mRNA è proteins the “central dogma” of molecular biology Ribosome protein,è function
  • 40. AMALEA 2022 • is an ancient molecular machine, ‘3D-printer’ for proteins the ribosome…
  • 41. AMALEA 2022 • is an ancient molecular machine, ‘3D-printer’ for proteins • ~ 107 cytoplasmic ribosomes per cell the ribosome…
  • 42. AMALEA 2022 • is an ancient molecular machine, ‘3D-printer’ for proteins • ~ 107 cytoplasmic ribosomes per cell • is believed to have universal function and the same the ribosome…
  • 43. AMALEA 2022 • is an ancient molecular machine, ‘3D-printer’ for proteins • ~ 107 cytoplasmic ribosomes per cell • is believed to have universal function and the same composition in different tissues and across species the ribosome…
  • 44. AMALEA 2022 • is an ancient molecular machine, ‘3D-printer’ for proteins • ~ 107 cytoplasmic ribosomes per cell • is believed to have universal function and the same composition in different tissues and across species • consists of RNA and the ribosome…
  • 45. AMALEA 2022 • is an ancient molecular machine, ‘3D-printer’ for proteins • ~ 107 cytoplasmic ribosomes per cell • is believed to have universal function and the same composition in different tissues and across species • consists of RNA and ribosomal proteins (RP) the ribosome…
  • 46. AMALEA 2022 • is an ancient molecular machine, ‘3D-printer’ for proteins • ~ 107 cytoplasmic ribosomes per cell • is believed to have universal function and the same composition in different tissues and across species • consists of RNA and ribosomal proteins (RP) the ribosome… also coded by DNA which is transcribed to mRNA
  • 47. AMALEA 2022 • is an ancient molecular machine, ‘3D-printer’ for proteins • ~ 107 cytoplasmic ribosomes per cell • is believed to have universal function and the same composition in different tissues and across species • consists of RNA and ribosomal proteins (RP) the ribosome… also coded by DNA which is transcribed to mRNA here: analysis of RP mRNA expression
  • 48. AMALEA 2022 Klijn et al. Nature Biotechnol. 33(3): 306-312 (2015) 675 cell lines public domain data sets GTeX (v6p) www.gtexportal.org 8,555 normal samples from 53 different tissues (with >50 samples) TCGA (NCI-GDC, v7) www.cancer.gov 10363 tumor samples, 730 tumor-adjacent normals
  • 49. AMALEA 2022 Klijn et al. Nature Biotechnol. 33(3): 306-312 (2015) 675 cell lines public domain data sets GTeX (v6p) www.gtexportal.org 8,555 normal samples from 53 different tissues (with >50 samples) TCGA (NCI-GDC, v7) www.cancer.gov 10363 tumor samples, 730 tumor-adjacent normals mRNA expression
  • 50. AMALEA 2022 Klijn et al. Nature Biotechnol. 33(3): 306-312 (2015) 675 cell lines public domain data sets GTeX (v6p) www.gtexportal.org 8,555 normal samples from 53 different tissues (with >50 samples) TCGA (NCI-GDC, v7) www.cancer.gov 10363 tumor samples, 730 tumor-adjacent normals normalization: constant sum of reads for each of the 78 RP depending on method: log-transform, z-score mRNA expression
  • 51. AMALEA 2022 normal samples (GTeX) whole blood - brain tissues - rest different tissues have different RP mRNA signatures: PCA
  • 52. AMALEA 2022 18 9 13 14 13 14 11 12 11 11 11 10 6 6 6 6 6 6 6 30 26 32 33 33 33 38 38 38 8 8 16 13 17 12 12 9 39 32 6 6 6 4 4 4 48 32 32 32 33 38 38 38 8 16 15 17 17 17 11 50 1 36 51 6 6 4 4 4 27 32 32 32 33 38 38 38 17 16 16 18 10 17 42 50 50 2 28 26 6 4 4 4 5 6 32 32 33 38 38 38 38 8 20 18 10 18 17 42 42 50 50 26 30 30 26 5 5 5 1 36 36 32 32 33 38 38 38 18 20 20 16 10 15 42 42 50 50 27 26 30 30 30 28 6 6 39 39 36 36 36 1 19 19 20 50 50 50 43 26 30 30 28 51 52 39 39 39 36 36 2 1 1 1 1 19 19 34 50 50 50 50 35 26 30 26 51 51 51 39 39 39 2 21 1 1 1 1 2 49 49 49 35 36 36 50 50 50 50 50 30 30 43 51 51 21 21 39 39 21 2 1 1 1 1 2 49 49 35 36 36 36 37 37 37 43 43 43 43 40 40 52 21 39 39 21 1 1 1 1 2 2 49 35 35 36 36 36 37 48 48 37 36 43 2 24 40 21 21 21 39 5 5 1 1 1 1 1 3 3 35 36 47 47 46 48 48 48 27 36 39 21 2 21 21 44 45 45 45 2 5 53 53 36 53 3 35 47 46 46 27 48 37 36 30 39 21 52 44 44 44 45 44 44 44 53 53 53 53 53 3 3 35 36 27 27 27 27 36 36 36 36 1 29 52 44 45 45 45 44 44 45 53 53 53 53 3 35 43 27 27 27 52 36 33 23 29 7 23 52 45 45 45 45 45 53 53 53 53 41 48 43 43 52 29 29 29 29 23 23 23 22 22 23 23 45 45 45 45 45 53 53 53 41 41 43 29 29 29 29 29 23 23 23 23 22 22 23 23 23 45 45 45 45 45 53 53 53 individual tissue labels (majority assignments) normal samples: SOM, labelled post-hoc
  • 53. AMALEA 2022 18 9 13 14 13 14 11 12 11 11 11 10 6 6 6 6 6 6 6 30 26 32 33 33 33 38 38 38 8 8 16 13 17 12 12 9 39 32 6 6 6 4 4 4 48 32 32 32 33 38 38 38 8 16 15 17 17 17 11 50 1 36 51 6 6 4 4 4 27 32 32 32 33 38 38 38 17 16 16 18 10 17 42 50 50 2 28 26 6 4 4 4 5 6 32 32 33 38 38 38 38 8 20 18 10 18 17 42 42 50 50 26 30 30 26 5 5 5 1 36 36 32 32 33 38 38 38 18 20 20 16 10 15 42 42 50 50 27 26 30 30 30 28 6 6 39 39 36 36 36 1 19 19 20 50 50 50 43 26 30 30 28 51 52 39 39 39 36 36 2 1 1 1 1 19 19 34 50 50 50 50 35 26 30 26 51 51 51 39 39 39 2 21 1 1 1 1 2 49 49 49 35 36 36 50 50 50 50 50 30 30 43 51 51 21 21 39 39 21 2 1 1 1 1 2 49 49 35 36 36 36 37 37 37 43 43 43 43 40 40 52 21 39 39 21 1 1 1 1 2 2 49 35 35 36 36 36 37 48 48 37 36 43 2 24 40 21 21 21 39 5 5 1 1 1 1 1 3 3 35 36 47 47 46 48 48 48 27 36 39 21 2 21 21 44 45 45 45 2 5 53 53 36 53 3 35 47 46 46 27 48 37 36 30 39 21 52 44 44 44 45 44 44 44 53 53 53 53 53 3 3 35 36 27 27 27 27 36 36 36 36 1 29 52 44 45 45 45 44 44 45 53 53 53 53 3 35 43 27 27 27 52 36 33 23 29 7 23 52 45 45 45 45 45 53 53 53 53 41 48 43 43 52 29 29 29 29 23 23 23 22 22 23 23 45 45 45 45 45 53 53 53 41 41 43 29 29 29 29 29 23 23 23 23 22 22 23 23 23 45 45 45 45 45 53 53 53 4 4 4 4 4 3 3 3 3 3 4 15 15 15 15 15 15 15 15 15 9 9 9 9 10 10 10 4 4 4 4 4 3 3 4 15 9 15 15 15 15 15 15 15 9 9 9 9 10 10 10 4 4 4 4 4 4 3 15 15 15 15 15 15 15 15 15 15 9 9 9 9 10 10 10 4 4 4 4 4 4 5 15 15 15 15 15 15 15 15 15 15 15 9 9 9 10 10 10 10 4 4 4 4 4 4 5 5 15 15 15 15 15 15 15 15 15 15 15 15 9 9 9 10 10 10 4 4 4 4 4 4 5 5 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 4 4 4 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 4 4 15 15 15 15 15 13 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 2 2 2 13 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 2 2 13 15 15 15 15 15 15 15 15 15 15 14 14 15 15 15 15 15 15 15 15 15 15 15 2 13 13 15 15 15 15 15 15 15 15 15 15 15 14 15 15 15 15 15 15 15 15 15 15 15 12 12 13 15 15 15 15 15 15 15 15 15 15 15 15 15 15 6 15 6 15 15 15 1 1 15 1 12 13 15 15 15 15 15 15 15 15 15 15 6 6 6 6 6 6 6 6 1 1 1 1 1 12 12 13 15 15 15 15 15 15 15 15 15 15 15 15 6 6 6 6 6 6 6 1 1 1 1 12 13 15 15 15 15 15 15 9 8 15 15 8 15 6 6 6 6 6 1 1 1 1 11 15 15 15 15 15 15 15 15 8 8 8 7 7 8 8 6 6 6 6 6 1 1 1 11 11 15 15 15 15 15 15 8 8 8 8 7 7 8 8 8 6 6 6 6 6 1 1 1 individual tissue labels (majority assignments) suggested groups of tissues normal samples: SOM, labelled post-hoc
  • 54. AMALEA 2022 18 rest ovary liver adrenal gland pancreas muscle heart cells (fibroblasts) cells (ebv) skin pituitary brain (rest) cerebellum testis blood 9 13 14 13 14 11 12 11 11 11 10 6 6 6 6 6 6 6 30 26 32 33 33 33 38 38 38 8 8 16 13 17 12 12 9 39 32 6 6 6 4 4 4 48 32 32 32 33 38 38 38 8 16 15 17 17 17 11 50 1 36 51 6 6 4 4 4 27 32 32 32 33 38 38 38 17 16 16 18 10 17 42 50 50 2 28 26 6 4 4 4 5 6 32 32 33 38 38 38 38 8 20 18 10 18 17 42 42 50 50 26 30 30 26 5 5 5 1 36 36 32 32 33 38 38 38 18 20 20 16 10 15 42 42 50 50 27 26 30 30 30 28 6 6 39 39 36 36 36 1 19 19 20 50 50 50 43 26 30 30 28 51 52 39 39 39 36 36 2 1 1 1 1 19 19 34 50 50 50 50 35 26 30 26 51 51 51 39 39 39 2 21 1 1 1 1 2 49 49 49 35 36 36 50 50 50 50 50 30 30 43 51 51 21 21 39 39 21 2 1 1 1 1 2 49 49 35 36 36 36 37 37 37 43 43 43 43 40 40 52 21 39 39 21 1 1 1 1 2 2 49 35 35 36 36 36 37 48 48 37 36 43 2 24 40 21 21 21 39 5 5 1 1 1 1 1 3 3 35 36 47 47 46 48 48 48 27 36 39 21 2 21 21 44 45 45 45 2 5 53 53 36 53 3 35 47 46 46 27 48 37 36 30 39 21 52 44 44 44 45 44 44 44 53 53 53 53 53 3 3 35 36 27 27 27 27 36 36 36 36 1 29 52 44 45 45 45 44 44 45 53 53 53 53 3 35 43 27 27 27 52 36 33 23 29 7 23 52 45 45 45 45 45 53 53 53 53 41 48 43 43 52 29 29 29 29 23 23 23 22 22 23 23 45 45 45 45 45 53 53 53 41 41 43 29 29 29 29 29 23 23 23 23 22 22 23 23 23 45 45 45 45 45 53 53 53 4 4 4 4 4 3 3 3 3 3 4 15 15 15 15 15 15 15 15 15 9 9 9 9 10 10 10 4 4 4 4 4 3 3 4 15 9 15 15 15 15 15 15 15 9 9 9 9 10 10 10 4 4 4 4 4 4 3 15 15 15 15 15 15 15 15 15 15 9 9 9 9 10 10 10 4 4 4 4 4 4 5 15 15 15 15 15 15 15 15 15 15 15 9 9 9 10 10 10 10 4 4 4 4 4 4 5 5 15 15 15 15 15 15 15 15 15 15 15 15 9 9 9 10 10 10 4 4 4 4 4 4 5 5 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 4 4 4 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 4 4 15 15 15 15 15 13 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 2 2 2 13 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 15 2 2 13 15 15 15 15 15 15 15 15 15 15 14 14 15 15 15 15 15 15 15 15 15 15 15 2 13 13 15 15 15 15 15 15 15 15 15 15 15 14 15 15 15 15 15 15 15 15 15 15 15 12 12 13 15 15 15 15 15 15 15 15 15 15 15 15 15 15 6 15 6 15 15 15 1 1 15 1 12 13 15 15 15 15 15 15 15 15 15 15 6 6 6 6 6 6 6 6 1 1 1 1 1 12 12 13 15 15 15 15 15 15 15 15 15 15 15 15 6 6 6 6 6 6 6 1 1 1 1 12 13 15 15 15 15 15 15 9 8 15 15 8 15 6 6 6 6 6 1 1 1 1 11 15 15 15 15 15 15 15 15 8 8 8 7 7 8 8 6 6 6 6 6 1 1 1 11 11 15 15 15 15 15 15 8 8 8 8 7 7 8 8 8 6 6 6 6 6 1 1 1 individual tissue labels (majority assignments) suggested groups of tissues normal samples: SOM, labelled post-hoc 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
  • 55. AMALEA 2022 55 normal samples: SOM, labelled post-hoc different tissues have different RP mRNA signatures: if majority<50% if majority <50%
  • 56. AMALEA 2022 main findings: (only very few presented here, see NAR paper) summary/conclusion
  • 57. AMALEA 2022 RP mRNA signatures vary with Ÿ tissue type Ÿ tumor type and sub-type Ÿ developmental stage RP translation rates are proportional to RP mRNA levels RP mRNA and profiling different in cell cultures vs. cells in-vivo main findings: (only very few presented here, see NAR paper) summary/conclusion
  • 58. AMALEA 2022 RP mRNA signatures vary with Ÿ tissue type Ÿ tumor type and sub-type Ÿ developmental stage RP translation rates are proportional to RP mRNA levels RP mRNA and profiling different in cell cultures vs. cells in-vivo main findings: (only very few presented here, see NAR paper) speculative (yet plausible) conclusions: RP composition and function (?) Ÿ is tissue-, tumor-, development-, environment-specific Ÿ adds a novel layer to the regulatory network of the cell Ÿ might play an important role in cancer summary/conclusion
  • 59. AMALEA 2022 RP mRNA signatures vary with Ÿ tissue type Ÿ tumor type and sub-type Ÿ developmental stage RP translation rates are proportional to RP mRNA levels RP mRNA and profiling different in cell cultures vs. cells in-vivo main findings: (only very few presented here, see NAR paper) speculative (yet plausible) conclusions: RP composition and function (?) Ÿ is tissue-, tumor-, development-, environment-specific Ÿ adds a novel layer to the regulatory network of the cell Ÿ might play an important role in cancer caveats: composition could be independent of RP abundance possible extra-ribosomal functions of RP direct inspection of ribosome is difficult summary/conclusion
  • 60. AMALEA 2022 supervised learning classification / regression / prediction based on labeled example data
  • 61. AMALEA 2022 supervised learning classification / regression / prediction based on labeled example data generic workflow: example data model apply to novel data training working
  • 62. AMALEA 2022 supervised learning classification / regression / prediction based on labeled example data generic workflow: example data model apply to novel data training working validation estimate working performance set parameters of model / training compare different models
  • 63. AMALEA 2022 supervised learning classification / regression / prediction based on labeled example data generic workflow: example data model apply to novel data training working obvious performance measures: overall / class-wise accuracy ROC, Precision Recall ... validation estimate working performance set parameters of model / training compare different models
  • 64. AMALEA 2022 21 supervised learning classification / regression / prediction based on labeled example data generic workflow: example data model apply to novel data training working obvious performance measures: overall / class-wise accuracy ROC, Precision Recall ... validation estimate working performance set parameters of model / training compare different models accuracy is not enough - interpretable “white-box” systems example: prototype-based models, distance-based classifiers
  • 65. AMALEA 2022 distance-based classifiers a simple distance-based system: NN classifier store a set of labeled examples N-dim. feature space
  • 66. AMALEA 2022 distance-based classifiers a simple distance-based system: NN classifier store a set of labeled examples classify a query according to the label of the Nearest Neighbor in the data set ? N-dim. feature space
  • 67. AMALEA 2022 distance-based classifiers a simple distance-based system: NN classifier store a set of labeled examples classify a query according to the label of the Nearest Neighbor in the data set piece-wise linear decision boundaries according to e.g. (squared) Euclidean distance: ? N-dim. feature space d(xµ , x⌫ ) = N X j=1 xµ j x⌫ j 2
  • 68. AMALEA 2022 distance-based classifiers a simple distance-based system: NN classifier store a set of labeled examples classify a query according to the label of the Nearest Neighbor in the data set piece-wise linear decision boundaries according to e.g. (squared) Euclidean distance: ? N-dim. feature space d(xµ , x⌫ ) = N X j=1 xµ j x⌫ j 2
  • 69. AMALEA 2022 distance-based classifiers a simple distance-based system: NN classifier store a set of labeled examples classify a query according to the label of the Nearest Neighbor in the data set piece-wise linear decision boundaries according to e.g. (squared) Euclidean distance: ? N-dim. feature space + conceptually simple, + no training phase - expensive (storage, computation) - sensitive to mislabeled data - overly complex decision boundaries d(xµ , x⌫ ) = N X j=1 xµ j x⌫ j 2
  • 70. AMALEA 2022 prototype based classification a prototype based classifier [Kohonen 1990] represent the data by one or several prototypes per class N-dim. feature space
  • 71. AMALEA 2022 prototype based classification a prototype based classifier [Kohonen 1990] represent the data by one or several prototypes per class classify a query according to the label of the nearest prototype (or alternative schemes) local decision boundaries acc. to Euclidean distances from the prototypes piece-wise linear class borders parameterized by prototypes N-dim. feature space
  • 72. AMALEA 2022 prototype based classification a prototype based classifier [Kohonen 1990] represent the data by one or several prototypes per class classify a query according to the label of the nearest prototype (or alternative schemes) local decision boundaries acc. to Euclidean distances from the prototypes piece-wise linear class borders parameterized by prototypes N-dim. feature space + less sensitive to outliers, lower storage needs, little computational effort in the working phase
  • 73. AMALEA 2022 prototype based classification a prototype based classifier [Kohonen 1990] represent the data by one or several prototypes per class classify a query according to the label of the nearest prototype (or alternative schemes) local decision boundaries acc. to Euclidean distances from the prototypes piece-wise linear class borders parameterized by prototypes N-dim. feature space + less sensitive to outliers, lower storage needs, little computational effort in the working phase - training phase required in order to place prototypes, model selection problem: number of prototypes per class etc.
  • 74. AMALEA 2022 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors
  • 75. AMALEA 2022 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors competitive learning: heuristic LVQ1 [Kohonen, 1990]
  • 76. AMALEA 2022 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors • initialize prototype vectors for each class competitive learning: heuristic LVQ1 [Kohonen, 1990]
  • 77. AMALEA 2022 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors • initialize prototype vectors for each class competitive learning: heuristic LVQ1 [Kohonen, 1990] • present a single example
  • 78. AMALEA 2022 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors • initialize prototype vectors for each class competitive learning: heuristic LVQ1 [Kohonen, 1990] • identify the winner (closest prototype) • present a single example
  • 79. AMALEA 2022 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors • initialize prototype vectors for each class competitive learning: heuristic LVQ1 [Kohonen, 1990] • identify the winner (closest prototype) • present a single example • move the winner - closer towards the data (same class)
  • 80. AMALEA 2022 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors • initialize prototype vectors for each class competitive learning: heuristic LVQ1 [Kohonen, 1990] • identify the winner (closest prototype) • present a single example • move the winner - closer towards the data (same class)
  • 81. AMALEA 2022 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors • initialize prototype vectors for each class competitive learning: heuristic LVQ1 [Kohonen, 1990] • identify the winner (closest prototype) • present a single example • move the winner - closer towards the data (same class)
  • 82. AMALEA 2022 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors • initialize prototype vectors for each class competitive learning: heuristic LVQ1 [Kohonen, 1990] • identify the winner (closest prototype) • present a single example • move the winner - closer towards the data (same class) - away from the data (different class)
  • 83. AMALEA 2022 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors • initialize prototype vectors for each class competitive learning: heuristic LVQ1 [Kohonen, 1990] • identify the winner (closest prototype) • present a single example • move the winner - closer towards the data (same class) - away from the data (different class) • many variants, including cost-function-based schemes, e.g. Generalized LVQ (approximates # of misclassifications)
  • 84. AMALEA 2022 ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. squared Euclidean) Learning Vector Quantization N-dimensional data, feature vectors ∙ tesselation of feature space [piece-wise linear] ∙ distance-based classification [here: Euclidean distances] ∙ aim: generalization ability correct classification of new data
  • 85. AMALEA 2022 LVQ distance measures ? key question: appropriate distance / (dis-) similarity measure fixed, pre-defined distance measures: (G)LVQ can formulated for general (differentiable) distances
  • 86. AMALEA 2022 LVQ distance measures ? key question: appropriate distance / (dis-) similarity measure fixed, pre-defined distance measures: (G)LVQ can formulated for general (differentiable) distances examples: Minkowski distances (p≠2), correlation based, statistical divergences, ... not necessarily metrics!
  • 87. AMALEA 2022 LVQ distance measures ? key question: appropriate distance / (dis-) similarity measure fixed, pre-defined distance measures: (G)LVQ can formulated for general (differentiable) distances examples: Minkowski distances (p≠2), correlation based, statistical divergences, ... not necessarily metrics! standard work-flow - consider several distance measures - compare performances in, e.g., cross-validation
  • 88. AMALEA 2022 LVQ distance measures ? key question: appropriate distance / (dis-) similarity measure fixed, pre-defined distance measures: (G)LVQ can formulated for general (differentiable) distances examples: Minkowski distances (p≠2), correlation based, statistical divergences, ... not necessarily metrics! standard work-flow - consider several distance measures - compare performances in, e.g., cross-validation elegant approach: Relevance Learning / adaptive distances - employ parameterized distance measure - optimize in the data-driven training process (cost function!)
  • 89. AMALEA 2022 Generalized Matrix Relevance LVQ generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009] d(w, x) = (w x) > ⇤ (w x) (GMLVQ)
  • 90. AMALEA 2022 Generalized Matrix Relevance LVQ generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009] d(w, x) = (w x) > ⇤ (w x) (GMLVQ)
  • 91. AMALEA 2022 Generalized Matrix Relevance LVQ generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009] d(w, x) = (w x) > ⇤ (w x) (GMLVQ) = [ ⌦ (w x) ] 2
  • 92. AMALEA 2022 GMLVQ generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009] training: adaptation of prototypes and distance measure guided by GLVQ cost function = [ ⌦ (w x) ] 2 d(w, x) = (w x) > ⇤ (w x) Generalized Matrix Relevance LVQ:
  • 93. AMALEA 2022 GMLVQ generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009] variants: one global, several local, class-wise relevance matrices rectangular low-dim. representation / visualization [Bunte et al., 2012] diagonal matrices: single feature weights [Hammer et al., 2002] training: adaptation of prototypes and distance measure guided by GLVQ cost function = [ ⌦ (w x) ] 2 d(w, x) = (w x) > ⇤ (w x) Generalized Matrix Relevance LVQ:
  • 94. AMALEA 2022 But this is just Mahalonobis distance… No.
  • 95. AMALEA 2022 But this is just Mahalonobis distance… [Mahalonobis, 1936] S covariance matrix of random vectors (calculated once from the data, fixed definition, not adaptive) x 2 RN (‘two point version’) No. dM (x, y) = q (x y)> S 1 (x y)
  • 96. AMALEA 2022 But this is just Mahalonobis distance… [Mahalonobis, 1936] S covariance matrix of random vectors (calculated once from the data, fixed definition, not adaptive) x 2 RN if you insist… (‘two point version’) So it is a generalized Mahalonobis distance ? No. dM (x, y) = q (x y)> S 1 (x y)
  • 97. AMALEA 2022 But this is just Mahalonobis distance… [Mahalonobis, 1936] S covariance matrix of random vectors (calculated once from the data, fixed definition, not adaptive) x 2 RN if you insist… (‘two point version’) So it is a generalized Mahalonobis distance ? No. a generalized broccoli E = ~! a generalization of Ohm’s Law dM (x, y) = q (x y)> S 1 (x y)
  • 98. AMALEA 2022 98 interpretation after training: prototypes represent typical class properties or subtypes (hope)
  • 99. AMALEA 2022 99 interpretation summarizes • the contribution of a single dimension • the relevance of original features in the classifier ⇤ij quantifies the contribution of the pair of features (i,j) to the distance after training: prototypes represent typical class properties or subtypes (hope) Relevance Matrix
  • 100. AMALEA 2022 100 interpretation summarizes • the contribution of a single dimension • the relevance of original features in the classifier Note: interpretation assumes implicitly that features have equal order of magnitude e.g. after z-score-transformation → (averages over data set) ⇤ij quantifies the contribution of the pair of features (i,j) to the distance after training: prototypes represent typical class properties or subtypes (hope) Relevance Matrix
  • 101. Urine Steroid Metabolomics as a Biomarker Tool for Detecting Malignancy in Patients with Adrenal Tumors www.ensat.org W. Arlt, M. Biehl, A. Taylor, S. Hahner, R. Libé, B. Hughes, P. Schneider, D. Smith, H. Stiekema, N. Krone, E. Porfiri, G. Opocher, J. Bertherat, F. Mantero, B. Allolio, M. Terzolo, P. Nightingale, C. Shackleton, X. Bertagna, M.Fassnacht, P. Stewart J Clinical Endocrinology & Metabolism 96: 3775-3784 (2011) tumor classification
  • 102. Urine Steroid Metabolomics as a Biomarker Tool for Detecting Malignancy in Patients with Adrenal Tumors www.ensat.org W. Arlt, M. Biehl, A. Taylor, S. Hahner, R. Libé, B. Hughes, P. Schneider, D. Smith, H. Stiekema, N. Krone, E. Porfiri, G. Opocher, J. Bertherat, F. Mantero, B. Allolio, M. Terzolo, P. Nightingale, C. Shackleton, X. Bertagna, M.Fassnacht, P. Stewart J Clinical Endocrinology & Metabolism 96: 3775-3784 (2011) tumor classification insight: marker selection, patented diagnosis tool follow-up: recurrence detection, other disorders, tumor sub-types...
  • 103. AMALEA 2022 103 two recent application examples
  • 104. AMALEA 2022 104 two recent application examples I) cytokine expression data: - insights into disease mechanisms of (early) rheumatoid arthritis based on synovial tissue samples ~ 50 samples represented by 117 cytokine expressions in synovial tissue, PCA+GMLVQ combined
  • 105. AMALEA 2022 105 two recent application examples I) cytokine expression data: - insights into disease mechanisms of (early) rheumatoid arthritis based on synovial tissue samples ~ 50 samples represented by 117 cytokine expressions in synovial tissue, PCA+GMLVQ combined II) FDG-PET brain scans: - ultimate goal: diagnosis of neurodegenerative diseases ~ 100 samples, ~200000 voxels per scan SSM/PCA+GMLVQ combined
  • 106. Early diagnosis (?) of Rheumatoid Arthritis Expression of chemokines CXCL4 and CXCL7 by synovial macrophages defines an early stage of rheumatoid arthritis Annals of the Rheumatic Diseases 75:763-771 (2016) L. Yeo, N. Adlard, M. Biehl, M. Juarez, M. Snow C.D. Buckley, A. Filer, K. Raza, D. Scheel-Toellner
  • 107. AMALEA 2022 34 Rheumatoid Arthritis Rheumatoid Arthritis (RA) - chronicle inflammatory disease - immune system affects joints - RA leads to deformation and disability
  • 108. AMALEA 2022 uninflamed control established RA early inflammation rheumatoid arthritis (RA)
  • 109. AMALEA 2022 uninflamed control established RA early inflammation resolving early RA rheumatoid arthritis (RA)
  • 110. AMALEA 2022 uninflamed control established RA early inflammation resolving early RA ultimate goals: understand pathogenesis and mechanism of progression rheumatoid arthritis (RA)
  • 111. AMALEA 2022 uninflamed control established RA early inflammation resolving early RA cytokine based diagnosis of RA at earliest possible stage ? ultimate goals: understand pathogenesis and mechanism of progression rheumatoid arthritis (RA)
  • 112. AMALEA 2022 mRNA extraction real-time PCR tissue section synovium synovial tissue cytokine expression
  • 113. AMALEA 2022 mRNA extraction real-time PCR tissue section synovium synovial tissue cytokine expression IL1A IL17F FASL CXCL4 CCL15 TGFB1 KITLG IL1B IL18 CD70 CXCL5 CCL16 TGFB2 MST1 IL1RN IL19 CD30L CXCL6 CCL17 TGFB3 SPP1 IL2 IL20 4-1BB-L CXCL7 CCL18 EGF SFRP1 IL3 IL21 TRAIL CXCL9 CCL19 FGF2 ANXA1 IL4 IL22 RANKL CXCL10 CCL20 TGFA TNFRSF13B IL5 IL23A TWEAK CXCL11 CCL21 IGF2 IL6R IL6 IL24 APRIL CXCL12 CCL22 VEGFA NAMPT IL7 IL25 BAFF CXCL13 CCL23 VEGFB C1QTNF3 IL8 IL26 LIGHT CXCL14 CCL24 MIF VCAM1 IL9 IL27 TL1A CXCL16 CCL25 LIF LGALS1 IL10 IL28A GITRL CCL1 CCL26 OSM LGALS9 IL11 IL29 FASLG CCL2 CCL27 ADIPOQ LGALS3 IL12A IL32 IFNA1 CCL3 CCL28 LEP LGALS12 IL12B IL33 IFNA2 CCL4 XCL1 GHRL IL13 LTA IFNB1 CCL5 XCL2 RETN IL14 TNF IFNG CCL7 CX3CL1 CTLA4 IL15 LTB CXCL1 CCL8 CSF1 EPO IL16 OX40L CXCL2 CCL11 CSF2 TPO IL17A CD40L CXCL3 CCL13 CSF3 FLT3LG panel of 117 cytokines • cell signaling proteins • regulate immune response • produced by, e.g. T-cells, macrophages, lymphocytes, fibroblasts, etc.
  • 114. AMALEA 2022 GMLVQ analysis pre-processing • log-transformed expression values • 21 leading principal components explain 95% of the variation x 2 I R117 , x = e x 2 I R21
  • 115. AMALEA 2022 GMLVQ analysis pre-processing • log-transformed expression values • 21 leading principal components explain 95% of the variation Two binary problems: (A) established RA vs. uninflamed controls (!) (B) early RA vs. resolving inflammation (") • 1 prototype per class, global relevance matrix, distance measure: x 2 I R117 , x = e x 2 I R21 d(e x, e w) = (e x e w) > e ⇤ (e x e w) = (x w) > > e ⇤ | {z } ⇤ (x w)
  • 116. AMALEA 2022 GMLVQ analysis pre-processing • log-transformed expression values • 21 leading principal components explain 95% of the variation Two binary problems: (A) established RA vs. uninflamed controls (!) (B) early RA vs. resolving inflammation (") • 1 prototype per class, global relevance matrix, distance measure: x 2 I R117 , x = e x 2 I R21 d(e x, e w) = (e x e w) > e ⇤ (e x e w) = (x w) > > e ⇤ | {z } ⇤ (x w) d(e x, e w) = (e x e w) > e ⇤ (e x e w) = (x w) > > e ⇤ | {z } ⇤ (x w) in <latexit sha1_base64="S6bznRCRGpJCx80D30CfTaQNebg=">AAAB83icbVDLSsNAFL2pr1pfUZduBovgqiSi6M6CG1dSxT6giWUynbRDJ5MwMxFK6G8I6kIRt36Av+HOv3HSdqGtBwYO59zLPXOChDOlHefbKiwsLi2vFFdLa+sbm1v29k5DxakktE5iHstWgBXlTNC6ZprTViIpjgJOm8HgIveb91QqFotbPUyoH+GeYCEjWBvJ8yKs+0GQ3Yzurjp22ak4Y6B54k5J+fzzMcdTrWN/ed2YpBEVmnCsVNt1Eu1nWGpGOB2VvFTRBJMB7tG2oQJHVPnZOPMIHRili8JYmic0Gqu/NzIcKTWMAjOZZ1SzXi7+57VTHZ75GRNJqqkgk0NhypGOUV4A6jJJieZDQzCRzGRFpI8lJtrUVDIluLNfnieNo4p7UnGunXL1GCYowh7swyG4cApVuIQa1IFAAg/wAq9Waj1bb9b7ZLRgTXd24Q+sjx8yWJYz</latexit> RN
  • 117. AMALEA 2022 GMLVQ analysis pre-processing • log-transformed expression values • 21 leading principal components explain 95% of the variation Two binary problems: (A) established RA vs. uninflamed controls (!) (B) early RA vs. resolving inflammation (") • 1 prototype per class, global relevance matrix, distance measure: • leave-two-out validation (one from each class) evaluation in terms of Receiver Operating Characteristics x 2 I R117 , x = e x 2 I R21 d(e x, e w) = (e x e w) > e ⇤ (e x e w) = (x w) > > e ⇤ | {z } ⇤ (x w) d(e x, e w) = (e x e w) > e ⇤ (e x e w) = (x w) > > e ⇤ | {z } ⇤ (x w) in <latexit sha1_base64="S6bznRCRGpJCx80D30CfTaQNebg=">AAAB83icbVDLSsNAFL2pr1pfUZduBovgqiSi6M6CG1dSxT6giWUynbRDJ5MwMxFK6G8I6kIRt36Av+HOv3HSdqGtBwYO59zLPXOChDOlHefbKiwsLi2vFFdLa+sbm1v29k5DxakktE5iHstWgBXlTNC6ZprTViIpjgJOm8HgIveb91QqFotbPUyoH+GeYCEjWBvJ8yKs+0GQ3Yzurjp22ak4Y6B54k5J+fzzMcdTrWN/ed2YpBEVmnCsVNt1Eu1nWGpGOB2VvFTRBJMB7tG2oQJHVPnZOPMIHRili8JYmic0Gqu/NzIcKTWMAjOZZ1SzXi7+57VTHZ75GRNJqqkgk0NhypGOUV4A6jJJieZDQzCRzGRFpI8lJtrUVDIluLNfnieNo4p7UnGunXL1GCYowh7swyG4cApVuIQa1IFAAg/wAq9Waj1bb9b7ZLRgTXd24Q+sjx8yWJYz</latexit> RN
  • 118. AMALEA 2022 false positive rate true positive rate diagonal Λii vs. cytokine index i (A) established RA vs. uninflamed control Relevances diagonal relevances leave-one-out
  • 119. AMALEA 2022 false positive rate true positive rate t rue positive rate diagonal Λii vs. cytokine index i (A) established RA vs. uninflamed control (B) early RA vs. resolving inflammation Relevances diagonal relevances leave-one-out
  • 120. AMALEA 2022 false positive rate true positive rate t rue positive rate diagonal Λii vs. cytokine index i (A) established RA vs. uninflamed control (B) early RA vs. resolving inflammation Relevances diagonal relevances leave-one-out
  • 121. AMALEA 2022 CXCL4 chemokine (C-X-C motif) ligand 4 CXCL7 chemokine (C-X-C motif) ligand 7 protein level studies
  • 122. AMALEA 2022 CXCL4 chemokine (C-X-C motif) ligand 4 CXCL7 chemokine (C-X-C motif) ligand 7 direct study on protein level, staining / imaging of sinovial tissue: macrophages : predominant source of CXCL4/7 expression protein level studies
  • 123. AMALEA 2022 CXCL4 chemokine (C-X-C motif) ligand 4 CXCL7 chemokine (C-X-C motif) ligand 7 direct study on protein level, staining / imaging of sinovial tissue: macrophages : predominant source of CXCL4/7 expression protein level studies • high levels of CXCL4 and CXLC7 in early RA • expression on macrophages outside of blood vessels discriminates early RA / resolving cases
  • 124. AMALEA 2022 false positive rate true positive rate t rue positive rate diagonal Λii vs. cytokine index i (A) established RA vs. uninflamed control (B) early RA vs. resolving inflammation relevant cytokines macrophage stimulating 1 diagonal relevances leave-one-out
  • 125. Machine learning analysis of FDG-PET brain images for the diagnosis of neurodegenerative disorders K.L. Leenders, S. Meles, … UMCG Groningen, Neurology R. van Veen, S. Lövdal Bernoulli Institute, Computer Science …
  • 126. AMALEA 2022 42 Glucose uptake http://glimpsproject.com subjects A B C FDG-PET 3D images Fluorodeoxyglucose positron emission tomography data
  • 127. AMALEA 2022 42 Glucose uptake http://glimpsproject.com subjects A B C FDG-PET 3D images Fluorodeoxyglucose positron emission tomography Healthy Controls HC Parkinson’s Disease PD Alzheimer’s Disease AD data
  • 128. AMALEA 2022 42 Subjects Source HC PD AD CUN 19 49 - UGOSM 44 58 55 UMCG 19 20 21 FDG-PET brain scans from 3 centers • Clínica Universidad de Navarra • Univ. Genova/IRCCS San Martino • Univ. Medical Center Groningen Glucose uptake http://glimpsproject.com subjects A B C FDG-PET 3D images Fluorodeoxyglucose positron emission tomography Healthy Controls HC Parkinson’s Disease PD Alzheimer’s Disease AD data
  • 129. AMALEA 2022 43 work flow subjects ~ 200000 voxels
  • 130. AMALEA 2022 43 work flow subjects ~ 200000 voxels subject specific anatomy high intensity, low noise voxels log-transform double centering masking (*) low-dimensional projections (*) details of pre-processing: D. Mudali et al. Computational and Mathematical Methods in Medicine. March 2015, Art.ID 136921 and references. therein (*) Scaled Subprofile Model / PCA based on a disjoint reference group of subjects
  • 131. AMALEA 2022 43 work flow subjects ~ 200000 voxels subject specific anatomy high intensity, low noise voxels log-transform double centering masking (*) low-dimensional projections (*) subject socres subjects details of pre-processing: D. Mudali et al. Computational and Mathematical Methods in Medicine. March 2015, Art.ID 136921 and references. therein (*) Scaled Subprofile Model / PCA based on a disjoint reference group of subjects
  • 132. AMALEA 2022 44 work flow subjects subject socres subjects labels (condition) classification: GMLVQ, SVM ~ 200000 voxels
  • 133. AMALEA 2022 44 work flow subjects subject socres subjects applied to novel subject test labels (condition) classification: GMLVQ, SVM ? ~ 200000 voxels
  • 134. AMALEA 2022 44 work flow subjects subject socres subjects applied to novel subject test labels (condition) classification: GMLVQ, SVM ? ~ 200000 voxels
  • 135. AMALEA 2022 45 (A) Perceptron of optimal stability (aka “SVM with linear kernel”) - linear threshold classifier - large margin (with errors) two classifiers
  • 136. AMALEA 2022 45 (A) Perceptron of optimal stability (aka “SVM with linear kernel”) - linear threshold classifier - large margin (with errors) two classifiers (B) Learning Vector Quantization (Generalized Matrix LVQ) - prototype- and distance-based classifier - relevance learning
  • 137. AMALEA 2022 45 (A) Perceptron of optimal stability (aka “SVM with linear kernel”) - linear threshold classifier - large margin (with errors) performance evaluation: averages over 10 randomized runs of 10-fold cross-validation accuracies, sensitivity /specificity Receiver Operating Characteristics for binary classification both classifiers outperformed Decision Trees in previous projects two classifiers (B) Learning Vector Quantization (Generalized Matrix LVQ) - prototype- and distance-based classifier - relevance learning
  • 138. AMALEA 2022 46 results subjects from one center only here: UGOSM unbiased classifiers ROC ±0.008
  • 139. AMALEA 2022 46 results subjects from one center only here: UGOSM unbiased classifiers ROC relatively good within-center performance also in three-class settings ±0.008
  • 140. AMALEA 2022 47 prototypes back-projected
  • 141. AMALEA 2022 48 results subjects from one center only here: UGOSM, PD vs. AD unbiased classifiers ROC
  • 142. AMALEA 2022 48 results subjects from one center only here: UGOSM, PD vs. AD unbiased classifiers ROC PD vs. AD subjects from centers combined for training and testing example: UMCG and UGOSM
  • 143. AMALEA 2022 48 results subjects from one center only here: UGOSM, PD vs. AD unbiased classifiers ROC PD vs. AD subjects from centers combined for training and testing example: UMCG and UGOSM reasonable (yet lower) overall performance also in the other classification problems
  • 144. AMALEA 2022 49 here: PD vs. HC unbiased classifiers ROC within center (example: UGMOS) results
  • 145. AMALEA 2022 49 here: PD vs. HC unbiased classifiers ROC within center (example: UGMOS) across centers: poor performance results
  • 146. AMALEA 2022 50 UMCG vs UGOSM experiment - classify subjects according to medical center here: AD patients only results/conclusions
  • 147. AMALEA 2022 50 UMCG vs UGOSM experiment - classify subjects according to medical center here: AD patients only results/conclusions possible explanations: - center-specific (pre-)processing despite identical equipment and work flows - significantly different patient cohorts (not the case) need for more consistent protocols, calibration / pre-processing aim: unified classifiers with good inter-center performance
  • 148. AMALEA 2022 51 Matlab: K Bunte: Relevance and Matrix adaptation in Learning Vector Quantization (GRLVQ, GMLVQ and LiRaM LVQ) -> code F Westerman, R Veen, M.B: A no-nonsense beginners’ tool for GMLVQ http://www.cs.rug.nl/~biehl/gmlvq sklvq: Scikit Learning Vector Quantization R van Veen, GJ de Vries, M. Biehl, JMLR 22 (2021), 1-6 https://www.cs.rug.nl/~biehl/ CITEC Bielefeld: scikit-learn compatible LVQ implementations from the machine learning group at CITEC Bielefeld: Java: plug-in for WEKA from the CI Group Mittweida M. Kästner, T. Villmann Python: