Biehl hanze-2021

Michael Biehl,
www.cs.rug.nl/~biehl
Intelligent Systems
Bernoulli Institute for Mathematics,
Computer Science and Artificial Intelligence
University of Groningen, The Netherlands
Medical applications of machine learning:
Prototype-based classifiers and relevance learning

Prototype and distance-based systems
• basic concepts of Learning Vector Quantization
• distance measures and relevance learning
• adrenal tumor classification: steroid metabolomics
• early stages of rheumatoid arthritis: cytokine expression
• neurodegenerative diseases: 3D FDG-PET scan images
Application examples
Challenges, summary and outlook
Medical applications of machine learning:
Prototype-based classifiers and relevance learning

supervised learning
classification / regression / prediction
based on labeled example data
generic workflow:
example data model apply to novel data
training working
obvious performance measures: overall / class-wise accuracy
ROC, Precision Recall ...
but ...
validation
estimate working performance
set parameters of model / training
compare different models

accuracy is not enough (P. Lisboa)
a machine learning urban legend
US military in the 1990s:
- classifier to distinguish US from Russian tanks
- trained on a data set of still images
- nearly perfect classification performance
(training and also validation / test)
- complete failure “in practice”
American tank Russian tank
only almost true :-)

5
models should be:
transparent / intuitive / interpretable, white box
e.g.: decision criteria used by the classifier
important features contributing
- avoid artifacts, e.g. due to hidden bias in the data
- gain better insight into the data set / problem
- potentially understand underlying mechanisms
one useful framework:
similarity or distance based methods
representation / parameterization in terms of prototypes
to be avoided: blind application of black box machine learning
accuracy is not enough

IAC Winter School 2018, La Laguna
distance-based classifiers
a simple distance-based system: (K) NN classifier
• store a set of labeled examples
• classify a query according to the
label of the Nearest Neighbor
(or the majority of K NN)
• piece-wise linear decision
boundaries according
to (e.g.) Euclidean distance
from all examples
?
N-dim. feature space
+ conceptually simple,
+ no training phase
+ only one parameter (K)
- expensive (storage, computation)
- sensitive to mislabeled data
- overly complex decision boundaries

prototype-based classification
• represent the data by one or
several prototypes per class
• classify a query according to the
label of the nearest prototype
(or alternative schemes)
• local decision boundaries acc.
to (e.g.) Euclidean distances
+
+ robust, low storage needs,
little computational effort
+ natural for multi-class problems
- model selection: number of prototypes per class, etc.
requires training: placement of prototypes in feature space
N-dim. feature space
?
parameterization in feature space, interpretability
Learning Vector Quantization [Kohonen]
-

∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors
for different classes
competitive learning: LVQ1 [Kohonen]
• identify the winner
(closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
- away from the data (different class)

∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
∙ tesselation of feature space
[piece-wise linear]
∙ distance-based classification
[here: Euclidean distances]
∙ generalization ability
correct classification of new data
∙ aim: discrimination of classes
( ≠ vector quantization
or density estimation )

cost function based LVQ
one example: Generalized LVQ (GLVQ) cost function [Sato&Yamada, 1995]
two winning prototypes:
minimize
E favors
- small number of misclassifications, e.g. with
- large margins between classes:
small , large
- class-typical prototypes
There is nothing objective about objective functions
- J. McClelland

GLVQ
training = optimization with respect to prototype position,
e.g. single example presentation, stochastic gradient descent,
update of two prototypes per step:
based on non-negative, differentiable distance
addtitional requirement:
a variety of distance measures can be used in the cost function

12
fixed, pre-defined distance measures:
Minkowski measures
kernelized distances
divergences, e.g.
...
alternative distance measures
possible work-flow
- select several distance measures according to prior knowledge
or in a data-driven preprocessing step
- compare performance of various measures (e.g. cross-validation)

astronomy vs. astrology
‘right’ distance?
angle instead
of physical
3-dim. distance
~
normalization
to unit sphere
even worse
in combination
with over-
fitting

Relevance Learning
elegant framework: relevance learning / adaptive distances
- employ a parameterized distance measure
with only the mathematical form fixed in advance
- optimize its parameters in the training process
- adaptive, data driven dissimilarity
example: Matrix Relevance LVQ
- data-driven optimization of prototypes
and relevance matrix
- in the same training process (≠ pre-processing )

Generalized Matrix Relevance LVQ:
generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009]
GMLVQ

GMLVQ
generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009]
variants:
one global, several local, class-wise relevance matrices
rectangular low-dim. representation / visualization
[Bunte et al., 2012]
diagonal matrices: single feature weights [Hammer et al., 2002]
training: adaptation of prototypes
and distance measure guided by
GLVQ cost function
Generalized Matrix Relevance LVQ:

18
interpretation
summarizes
• the contribution of a single dimension
• the relevance of original features in the classifier
Note: interpretation assumes implicitly that
features have equal order of magnitude
e.g. after z-score-transformation →
(averages over data set)
quantifies the contribution of pairs of
features (i,j) to the distance
after training:
prototypes represent typical class properties or subtypes
Relevance Matrix

Relevance Matrix LVQ
optimization of
prototype positions
distance measure(s)
in one training process
(≠ pre-processing)
motivation:
improved performance
- weighting of features and pairs of features
simplified classification schemes
- elimination of non-informative, noisy features
- discriminative low-dimensional representation
insight into the data / classification problem
- identification of most discriminative features
- incorporation of prior knowledge (e.g. structure of Ω)

Iris flower data [Fisher, 1936]
GMLVQ prototypes relevance matrix
1
2
3
4

empirical observation / theory:
relevance matrix becomes
singular, dominated by
very few eigenvectors
prevents over-fitting in
high-dim. feature spaces
facilitates discriminative
visualization / low-dim.
representation of datasets
confirms: Setosa well-separated
from Virginica / Versicolor

22
three application examples
I) steroid metabolomics
- discrimination of malignant vs. benign adrenal tumors
based on urinary steroid metabolite excretion
main aim: practical diagnosis support tool
II) cytokine expression data
- detection of (early) rheumatoid arthritis
based on synovial tissue samples
main aim: marker identification, disease mechanisms
III) FDG-PET scan brain images
- diagnosis / discrimination of neurodegenerative diseases
based on 3D functional imaging
main aim: method development /processing pipelines

(I) Steroid metabolomics: detecting
malignancy in adrenocortical tumors
www.ensat.org
W. Arlt, M. Biehl, A. Taylor, S. Hahner, R. Libé, B. Hughes, P. Schneider,
D. Smith, H. Stiekema, N. Krone, E. Porfiri, G. Opocher, J. Bertherat,
F. Mantero, B. Allolio, M. Terzolo, P. Nightingale, C. Shackleton,
X. Bertagna, M.Fassnacht, P. Stewart
Urine Steroid Metabolomics as a Biomarker Tool for Detecting
Malignancy in Patients with Adrenal Tumors
J Clinical Endocrinology & Metabolism 96: 3775-3784 (2011)

www.ensat.org
classification of adrenocortical tumors (adenoma vs. carcinoma)
based on steroid hormone excretion profiles
benign ACA malignant ACC
features: 32 steroid metabolite excretion values
non-invasive measurement (24 hrs. urine samples)
steroid metabolomics
aim: develop a novel biomarker tool for differential diagnosis
idea: identify characteristic steroid profiles (prototypes)

Generalized Matrix LVQ , ACC vs. ACA classification
∙ data divided in 90% training, 10% test set, (z-score transformed)
∙ determine prototypes
typical profiles (1 per class)
∙ apply classifier to test data
evaluate performance (error rates, ROC)
∙ adaptive generalized quadratic distance measure
parameterized by
∙ repeat and average over many random splits
[Arlt et al., 2011]
[Biehl et al., 2012]

prototypes: steroid excretion in ACA/ACC
ACA
ACC
(z-score
transformed)
metabolite
excretion

subset of selected steroids ↔ technical realization (patented, UoB)
using 9 markers only, similar ROC
Relevance matrix
… of pairs of markers
contribution of single markers

∙ Receiver Operating Characteristics (ROC)
ROC considers modified (biased) classification scheme:
false positive rate
(1-specificity)
true
positive
rate
(sensitivity)
θ = 0
Area under Curve
(AUC)
all tumors classified as ACA
- no false positives
- no true positives detected
all tumors classified as ACC
- all true positives detected
- max. number of false positives
Note: different types of errors have very different consequences!

ROC characteristics
clear improvement due to
adaptive distances
90% / 10% randomized
splits of the data in
training and test set
averages over 1000 runs
(1-specificity)
(sensitivity)
diagonal rel.
Euclidean
full matrix
AUC
0.87
0.93
0.97

off-diagonal
diagonal elements
19
ACA
ACC
discriminative
e.g. steroid 19 (THS)
Relevance matrix

highly discriminative
combination of markers!
weakly
discriminative
markers
5a-THA (8)
TH-Doc (12)

visualization of the data set
ACA
ACC
generic property: relevance matrix becomes highly singular

• monitoring of patients after surgery and/or under medication
aim: recurrence detection proof of concept study submitted
work in progress
• high-throughput LC/MS assay to replace GC/MS,
publication in preparation
• other disorders affecting / related to steroid metabolism
e.g. liver disease (NAFLD etc.), first results submitted
(with J. Tomlinson, Oxford)
• prospective study w.r.t. ~ 2000 patients, submitted
confirms performance as a practical diagnosis system

(II) Early stages of Rheumatoid Arthritis
Expression of chemokines CXCL4 and CXCL7 by synovial
macrophages defines an early stage of rheumatoid arthritis
Annals of the Rheumatic Diseases 75:763-771 (2016)
L. Yeo, N. Adlard, M. Biehl, M. Juarez, M. Snow
C.D. Buckley, A. Filer, K. Raza, D. Scheel-Toellner

uninflamed control established RA early inflammation
resolving early RA
cytokine based diagnosis of RA
at earliest possible stage ?
ultimate goals:
understand pathogenesis and
mechanism of progression
rheumatoid arthritis (RA)

mRNA extraction real-time PCR
tissue section
synovium
synovial tissue cytokine expression
IL1A IL17F FASL CXCL4 CCL15 TGFB1 KITLG
IL1B IL18 CD70 CXCL5 CCL16 TGFB2 MST1
IL1RN IL19 CD30L CXCL6 CCL17 TGFB3 SPP1
IL2 IL20 4-1BB-L CXCL7 CCL18 EGF SFRP1
IL3 IL21 TRAIL CXCL9 CCL19 FGF2 ANXA1
IL4 IL22 RANKL CXCL10 CCL20 TGFA TNFRSF13B
IL5 IL23A TWEAK CXCL11 CCL21 IGF2 IL6R
IL6 IL24 APRIL CXCL12 CCL22 VEGFA NAMPT
IL7 IL25 BAFF CXCL13 CCL23 VEGFB C1QTNF3
IL8 IL26 LIGHT CXCL14 CCL24 MIF VCAM1
IL9 IL27 TL1A CXCL16 CCL25 LIF LGALS1
IL10 IL28A GITRL CCL1 CCL26 OSM LGALS9
IL11 IL29 FASLG CCL2 CCL27 ADIPOQ LGALS3
IL12A IL32 IFNA1 CCL3 CCL28 LEP LGALS12
IL12B IL33 IFNA2 CCL4 XCL1 GHRL
IL13 LTA IFNB1 CCL5 XCL2 RETN
IL14 TNF IFNG CCL7 CX3CL1 CTLA4
IL15 LTB CXCL1 CCL8 CSF1 EPO
IL16 OX40L CXCL2 CCL11 CSF2 TPO
IL17A CD40L CXCL3 CCL13 CSF3 FLT3LG
panel of 117 cytokines
• cell signaling proteins
• regulate immune response
• produced by, e.g.
T-cells, macrophages,
lymphocytes, fibroblasts, etc.
IL1A IL17F FASL CXCL4 CCL15 TGFB1 KITLG
IL1B IL18 CD70 CXCL5 CCL16 TGFB2 MST1
IL1RN IL19 CD30L CXCL6 CCL17 TGFB3 SPP1
IL2 IL20 4-1BB-L CXCL7 CCL18 EGF SFRP1
IL3 IL21 TRAIL CXCL9 CCL19 FGF2 ANXA1
IL4 IL22 RANKL CXCL10 CCL20 TGFA TNFRSF13B
IL5 IL23A TWEAK CXCL11 CCL21 IGF2 IL6R
IL6 IL24 APRIL CXCL12 CCL22 VEGFA NAMPT
IL7 IL25 BAFF CXCL13 CCL23 VEGFB C1QTNF3
IL8 IL26 LIGHT CXCL14 CCL24 MIF VCAM1
IL9 IL27 TL1A CXCL16 CCL25 LIF LGALS1
IL10 IL28A GITRL CCL1 CCL26 OSM LGALS9
IL11 IL29 FASLG CCL2 CCL27 ADIPOQ LGALS3
IL12A IL32 IFNA1 CCL3 CCL28 LEP LGALS12
IL12B IL33 IFNA2 CCL4 XCL1 GHRL
IL13 LTA IFNB1 CCL5 XCL2 RETN
IL14 TNF IFNG CCL7 CX3CL1 CTLA4
IL15 LTB CXCL1 CCL8 CSF1 EPO
IL16 OX40L CXCL2 CCL11 CSF2 TPO
IL17A CD40L CXCL3 CCL13 CSF3 FLT3LG

GMLVQ analysis
pre-processing:
• log-transformed expression values
• 21 leading principal components explain 95% of the variation
Two two-class problems: (A) established RA vs. uninflamed controls
(B) early RA vs. resolving inflammation
• 1 prototype per class, global relevance matrix, distance measure:
• leave-two-out validation (one from each class)
evaluation in terms of Receiver Operating Characteristics

false positive rate
true
positive
rate
t
rue
positive
rate
diagonal Λii vs. cytokine index i
(A) established RA vs.
uninflamed control
(B) early RA vs.
resolving inflammation
Matrix Relevance LVQ
diagonal relevances
leave-two-out

39
Two PhD thesis
projectsand a lot of
labwork later ...

CXCL4 chemokine (C-X-C motif) ligand 4
CXCL7 chemokine (C-X-C motif) ligand 7
direct study on protein level, staining / imaging of sinovial tissue:
macrophages : predominant source of CXCL4/7 expression
protein level studies
• high levels of CXCL4 and
CXLC7 in early RA
• expression on macrophages
outside of blood vessels
discriminates
early RA / resolving cases

false positive rate
true
positive
rate
t
rue
positive
rate
diagonal Λii vs. cytokine index i
(A) established RA vs.
uninflamed control
(B) early RA vs.
resolving inflammation
Matrix Relevance LVQ
diagonal relevances
leave-one-out

future work
• more samples (difficult...) needed in order
to obtain a reliable early diagnosis
• integrated analysis of gene expression and other data
from the same / an analogous patient cohort

(III) Analysis of FDG-PET image data for the
diagnosis of neurodegenerative disorders

44
based on: FDG-PET scan brain images
subject scores derived from 3D images
data: acquired at three different medical centers
identical equipment, identical processing (??)
ultimate early reliable diagnosis of neurodegenerative disorders
aim: Alzheimer’s disease (AD), Parkinson’s disease (PD), etc.
analysis: machine learning, classifiers: SVM, (L)GMLVQ
within center and across center performances
questions: reliable FDG-PET based diagnosis ?
compatible across different medical centers ?
can we obtain a robust‘universal classifier’ ?
overview

Subjects
Source HC PD AD
CUN 19 49 -
UGOSM 44 58 55
UMCG 19 20 21
FDG-PET brain scans from 3 centers
• Clínica Universidad de Navarra
• Univ. Genoa/IRCCS San Martino
• Univ. Medical Center Groningen
Glucose
uptake
http://glimpsproject.com
subjects
A
B
C
FDG-PET 3D images
Fluorodeoxyglucose
positron emission tomography
Healthy Controls HC
Parkinson’s Disease PD
Alzheimer’s Disease AD
data

46
work flow
subjects
~
200000
voxels
subject specific
anatomy
high intensity,
low noise voxels
log-transform
masking (*)
low-dimensional
projections
by SSM/PCA (*)
subject
socres
subjects
details of pre-processing:
D. Mudali et al.
Computational and Mathematical Methods in Medicine.
March 2015, Art.ID 136921 and references therein
(*) Scaled Subprofile Model / PCA based
on a (disjoint) reference group of subjects

47
work flow
subjects
subject
socres
subjects
applied to
novel subject
test
labels
(condition)
classification:
GMLVQ, SVM
?
~
200000
voxels

48
(A) Perceptron of optimal stability (aka “SVM with linear kernel”)
- linear threshold classifier
- large margin (with errors)
- Matlab R2016a (Statistics Toolbox):
fitcsvm, predict with default parameters
performance evaluation:
averages over 10 randomized runs of 10-fold cross-validation
accuracies, sensitivity /specificity, ROC, ...
(A,B) have outperformed Decision Trees in previous projects
classifiers
(B) Generalized Matrix Learning Vector Quantization (GMLVQ)
www.cs.rug.nl/~biehl/gmlvq
(C) Local Relevance Matrix LVQ (LGMLVQ)
http://matlabserver.cs.rug.nl/gmlvqweb/web/
with default parameters, one prototype per class

results: within centers
• subjects from one center only, here: UGOSM
relatively good within-center performance (also in 3-class setting)
Classifier Sens. (%) Spec. (%) AUC (ROC)
PD vs HC SVM 74.23 (19.0) 68.05 (25.9) 0.80 (0.2)
GMLVQ 75.13 (16.9) 77.50 (22.5) 0.84 (0.1)
LGMLVQ 79.23 (15.2) 68.15 (22.6) 0.83 (0.1)
AD vs HC SVM 95.40 (8.9) 92.00 (13.1) 0.99 (0.0)
GMLVQ 88.67 (15.0) 92.90 (13.4) 0.97 (0.1)
LGMLVQ 91.47 (12.3) 91.45 (14.4) 0.98 (0.0)
PD vs AD SVM 82.10 (16.2) 83.83 (16.0) 0.92 (0.1)
GMLVQ 81.00 (17.2) 81.67 (15.8) 0.91 (0.1)
LGMLVQ 84.70 (15.2) 86.63 (14.8) 0.95 (0.1)
[ mean (std. dev.)]

results: across centers
• compatible across different medical centers ?
reasonable, yet lower accuracies across centers
PD vs. HC Classifier Sens. (%) Spec. (%) AUC(ROC)
Training: CUN
Test: UGOSM
SVM 58.62 70.45 0.68
GMLVQ 86.21 31.82 0.72
LGMLVQ 98.28 4.55 0.57
Training: UGOSM
Test: UMCG
SVM 100.00 21.05 0.82
GMLVQ 70.00 63.16 0.74
LGMLVQ 95.00 47.37 0.91
Training: UMCG
Test: CUN
SVM 54.41 73.68 0.70
GMLVQ 33.82 89.47 0.70
LGMLVQ 47.06 84.21 0.72

experiment - can we classify subjects according to medical center ?
results: prediction of centers
possible explanations:
- center-specific (pre-)processing despite supposedly
identical equipment and work flows
- significantly different patient cohorts (not the case in HC)
HC only Classifier Sens. (%) Spec. (%) AUC (ROC)
CUN vs.
UGOSM
SVM 99.75 93.00 1.00
GMLVQ 97.30 91.00 0.99
LGMLVQ 100.00 89.50 0.99

outlook: voxel space interpretation
PD / AD
prototypes
(low-dim.)
back-projections
(pseudo-inverse)
on-going:
assessment by
radiologists /
neurologists

outlook: voxel space interpretation
discriminative directions in voxel-space
prototypes

outlook: across center classification
aim: unified classifiers with good inter-center performance
check/improve: consistent protocols / essays
unified pre-processing
dummy measurments
matching patient cohorts (?)
transfer learning:
identifiy and correct systematic differences
adjustment using center-specific prototypes
eliminate center-discriminating directions

55
summary/conclusion
prototype- and distance based systems:
- intuitive, transparent, interpretable
- easy to implement, flexible, natural tool for multiclass-problems
- classification, regression, unsupervised learning, visualization ...
- relevance learning: further insight into data and problem
- suitable for a variety of bio-medical problems
review articles:
M. Biehl, B. Hammer, T. Villmann. Prototype-based models in
Machine Learning. Advanced Review: WIRES Cognitive Science 7(2):
92-111 (2016)
M. Biehl: Biomedical Applications of Prototype Based Classifiers
and Relevance Learning. In: Proc. 4th Intl. Conf. on Algorithms for
Comput. Biology Springer Lecture Notes in Comp. Sci. 10252, 2017

http://matlabserver.cs.rug.nl/gmlvqweb/web/
Matlab code:
Relevance and Matrix adaptation in Learning Vector
Quantization (GRLVQ, GMLVQ and LiRaM LVQ) [K. Bunte]
http://www.cs.rug.nl/~biehl/
links
Related pre- and re-prints etc.:
A no-nonsense beginners’ tool for GMLVQ:
http://www.cs.rug.nl/~biehl/gmlvq
A Scikit-Learn compatible collection of Python code
for LVQ and variants, including GMLVQ [Rick van Veen]:
https://sklvq.readthedocs.io/en/stable/
source code:
https://github.com/rickvanveen/sklvq

Biehl hanze-2021

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Biehl hanze-2021

Similar to Biehl hanze-2021 (20)

More from University of Groningen

More from University of Groningen (14)

Recently uploaded

Recently uploaded (20)

Biehl hanze-2021