SlideShare a Scribd company logo
1 of 49
Michael Biehl Intelligent Systems
Johann Bernoulli Institute for
Mathematics and Computing Science
University of Groningen / NL
Biomedical applications of prototype-based
classifiers and relevance learning
www.cs.rug.nl/~biehl
Introduction: prototype-based classification, relevance learning
Generalized Matrix Relevance LVQ
Illustration: three bio-medical applications
AlCoB, June 2017, Aveiro / Portugal
2
supervised learning
classification / regression / prediction
based on labeled example data
generic workflow:
example data model apply to novel datatraining working
obvious performance measures: overall / class-wise accuracy
ROC, Precision Recall ...
validation
estimate working performance
set parameters of model / training
compare different models
accuracy is not enough - interpretable “white-box” systems
example: prototype-based models, distance-based classifiers
AlCoB, June 2017, Aveiro / Portugal
distance-based classifiers
a simple distance-based system: (K) NN classifier
• store a set of labeled examples
• classify a query according to the
label of the Nearest Neighbor
(or the majority of K NN)
• piece-wise linear decision
boundaries according
to (e.g.) Euclidean distance
from all examples
?
N-dim. feature space
+ conceptually simple,
+ no training phase
+ only one parameter (K)
- expensive (storage, computation)
- sensitive to mislabeled data
- overly complex decision boundaries
AlCoB, June 2017, Aveiro / Portugal
prototype-based classification
• represent the data by one or
several prototypes per class
• classify a query according to the
label of the nearest prototype
(or alternative schemes)
• local decision boundaries acc.
to (e.g.) Euclidean distances
+
+ robust, low storage needs,
little computational effort
- model selection: number of prototypes per class, etc.
requires training: placement of prototypes in feature space
N-dim. feature space
?
parameterization in feature space, interpretability
Learning Vector Quantization [Kohonen, 1990]
AlCoB, June 2017, Aveiro / Portugal
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
• initialize prototype vectors
for different classes
competitive learning: LVQ1 [Kohonen, 1990]
• identify the winner
(closest prototype)
• present a single example
• move the winner
- closer towards the data (same class)
- away from the data (different class)
AlCoB, June 2017, Aveiro / Portugal
∙ identification of prototype vectors from labeled example data
∙ distance based classification (e.g. Euclidean)
Learning Vector Quantization
N-dimensional data, feature vectors
∙ tesselation of feature space
[piece-wise linear]
∙ distance-based classification
[here: Euclidean distances]
∙ generalization ability
correct classification of new data
∙ aim: discrimination of classes
( ≠ vector quantization
or density estimation )
AlCoB, June 2017, Aveiro / Portugal
cost function based LVQ
one example: Generalized LVQ (GLVQ) cost function [Sato&Yamada, 1995]
two winning prototypes:
minimize
E favors
- small number of misclassifications, e.g. with
- large margins between classes
- small , large
- class-typical prototypes
AlCoB, June 2017, Aveiro / Portugal
LVQ distance measures
? key question: appropriate distance / (dis-) similarity measure
fixed, pre-defined distance measures:
(G)LVQ can formulated for general (differentiable) distances
examples: Minkowski distances (p≠2), correlation based,
statistical divergences, ... not necessarily metrics!
standard work-flow
- consider several distance measures according to prior knowledge
- compare performances in, e.g., cross-validation
elegant approach: Relevance Learning / adaptive distances
- employ parameterized distance measure
- optimize in the data-driven training process (cost function!)
AlCoB, June 2017, Aveiro / Portugal
Generalized Matrix Relevance LVQ:
generalized quadratic distance in LVQ:
[Schneider, Biehl, Hammer, 2009]
GMLVQ
AlCoB, June 2017, Aveiro / Portugal
GMLVQ
generalized quadratic distance in LVQ:
[Schneider, Biehl, Hammer, 2009]
variants:
one global, several local, class-wise relevance matrices
rectangular low-dim. representation / visualization
[Bunte et al., 2012]
diagonal matrices: single feature weights [Hammer et al., 2002]
training: adaptation of prototypes
and distance measure guided by
GLVQ cost function
Generalized Matrix Relevance LVQ:
AlCoB, June 2017, Aveiro / Portugal 11
interpretation
summarizes
• the contribution of a single dimension
• the relevance of original features in the classifier
Note: interpretation assumes implicitly that
features have equal order of magnitude
e.g. after z-score-transformation →
(averages over data set)
quantifies the contribution of the pair
of features (i,j) to the distance
after training:
prototypes represent typical class properties or subtypes
Relevance Matrix
AlCoB, June 2017, Aveiro / Portugal 12
three application examples
I) steroid metabolomics:
- detection of malignancy in adrenocortical tumors
based on urinary steroid metabolite excretion
GMLVQ: ~ 150 samples, 32-dim. feature vectors
II) cytokine expression data:
- diagnosis of (early) rheumatoid arthritis
based on synovial tissue samples
~ 50 samples represented by 117 cytokine expressions
in synovial tissue, PCA+GMLVQ combined
III) gene expression data:
- recurrence risk prediction from tumor samples
~ 400 samples, ~20000 dim. feature space
outlier analysis + GMLVQ on (80) pre-selected genes
Steroid metabolomics: detecting
malignancy in adrenocortical tumors
www.ensat.org
W. Arlt, M. Biehl, A. Taylor, S. Hahner, R. Libé, B. Hughes, P. Schneider,
D. Smith, H. Stiekema, N. Krone, E. Porfiri, G. Opocher, J. Bertherat,
F. Mantero, B. Allolio, M. Terzolo, P. Nightingale, C. Shackleton,
X. Bertagna, M.Fassnacht, P. Stewart
Urine Steroid Metabolomics as a Biomarker Tool for Detecting
Malignancy in Patients with Adrenal Tumors
J Clinical Endocrinology & Metabolism 96: 3775-3784 (2011)
AlCoB, June 2017, Aveiro / Portugal
www.ensat.org
classification of adrenocortical tumors (adenoma vs. carcinoma)
based on steroid hormone excretion profiles
benign ACA malignant ACC
features: 32 steroid metabolite excretion values
non-invasive measurement (24 hrs. urine samples)
steroid metabolomics
aim: develop a novel biomarker tool for differential diagnosis
idea: identify characteristic steroid profiles (prototypes)
AlCoB, June 2017, Aveiro / Portugal
Generalized Matrix LVQ , ACC vs. ACA classification
∙ data divided in 90% training, 10% test set, (z-score transformed)
∙ determine prototypes
typical profiles (1 per class)
∙ apply classifier to test data
evaluate performance (error rates, ROC)
∙ adaptive generalized quadratic distance measure
parameterized by
∙ repeat and average over many random splits
[Arlt et al., 2011]
[Biehl et al., 2012]
steroid metabolomics
AlCoB, June 2017, Aveiro / Portugal
prototypes: steroid excretion in ACA/ACC
ACA
ACC
steroid metabolomics
AlCoB, June 2017, Aveiro / Portugal
subset of selected steroids ↔ technical realization (patented, UoB)
using 9 markers only, similar ROC
Relevance matrix
… of pairs of markersrelevance of single markers
frequency of markers to be among top 9
steroid metabolomics
AlCoB, June 2017, Aveiro / Portugal
ROC characteristics
clear improvement due to
adaptive distances
90% / 10% randomized
splits of the data in
training and test set
averages over 1000 runs
(1-specificity)
(sensitivity)
diagonal rel.
Euclidean
full matrix
AUC
0.87
0.93
0.97
steroid metabolomics
AlCoB, June 2017, Aveiro / Portugal
off-diagonaldiagonal elements
19
ACA
ACC
discriminative
e.g. steroid 19 (THS)
Relevance matrix
steroid metabolomics
AlCoB, June 2017, Aveiro / Portugal
highly discriminative
combination of markers!
weaklydiscriminativemarkers
5a-THA (8)
TH-Doc (12)
steroid metabolomics
AlCoB, June 2017, Aveiro / Portugal
(1-specificity)
(sensitivity)
8
GMLVQ
GRLVQ
diagonal rel.
Euclidean
full matrix
AUC
0.87
0.93
0.97
adrenocortical tumors
AlCoB, June 2017, Aveiro / Portugal
visualization of the data set
ACA
ACC
generic property: relevance matrix becomes highly singular
AlCoB, June 2017, Aveiro / Portugal
• monitoring of patients after surgery and/or under medication
aim: recurrence detection / prediction
work in progress
• high-throughput LC/MS assay to replace GC/MS
• other disorders affecting / related to steroid metabolism
• identification of tumor subtypes ?
• on-going prospective study w.r.t. ~ 2000 patients
Early diagnosis of Rheumatoid Arthritis
Expression of chemokines CXCL4 and CXCL7 by synovial
macrophages defines an early stage of rheumatoid arthritis
Annals of the Rheumatic Diseases 75:763-771 (2016)
L. Yeo, N. Adlard, M. Biehl, M. Juarez, M. Snow
C.D. Buckley, A. Filer, K. Raza, D. Scheel-Toellner
AlCoB, June 2017, Aveiro / Portugal
uninflamed control established RA early inflammation
resolving early RA
cytokine based diagnosis of RA
at earliest possible stage ?
ultimate goals:
understand pathogenesis and
mechanism of progression
rheumatoid arthritis (RA)
AlCoB, June 2017, Aveiro / Portugal
mRNA extraction real-time PCRtissue sectionsynovium
synovial tissue cytokine expression
IL1A IL17F FASL CXCL4 CCL15 TGFB1 KITLG
IL1B IL18 CD70 CXCL5 CCL16 TGFB2 MST1
IL1RN IL19 CD30L CXCL6 CCL17 TGFB3 SPP1
IL2 IL20 4-1BB-L CXCL7 CCL18 EGF SFRP1
IL3 IL21 TRAIL CXCL9 CCL19 FGF2 ANXA1
IL4 IL22 RANKL CXCL10 CCL20 TGFA TNFRSF13B
IL5 IL23A TWEAK CXCL11 CCL21 IGF2 IL6R
IL6 IL24 APRIL CXCL12 CCL22 VEGFA NAMPT
IL7 IL25 BAFF CXCL13 CCL23 VEGFB C1QTNF3
IL8 IL26 LIGHT CXCL14 CCL24 MIF VCAM1
IL9 IL27 TL1A CXCL16 CCL25 LIF LGALS1
IL10 IL28A GITRL CCL1 CCL26 OSM LGALS9
IL11 IL29 FASLG CCL2 CCL27 ADIPOQ LGALS3
IL12A IL32 IFNA1 CCL3 CCL28 LEP LGALS12
IL12B IL33 IFNA2 CCL4 XCL1 GHRL
IL13 LTA IFNB1 CCL5 XCL2 RETN
IL14 TNF IFNG CCL7 CX3CL1 CTLA4
IL15 LTB CXCL1 CCL8 CSF1 EPO
IL16 OX40L CXCL2 CCL11 CSF2 TPO
IL17A CD40L CXCL3 CCL13 CSF3 FLT3LG
panel of 117 cytokines
• cell signaling proteins
• regulate immune response
• produced by, e.g.
T-cells, macrophages,
lymphocytes, fibroblasts, etc.
AlCoB, June 2017, Aveiro / Portugal
GMLVQ analysis
pre-processing:
• log-transformed expression values
• 21 leading principal components explain 95% of the variation
Two two-class problems: (A) established RA vs. uninflamed controls
(B) early RA vs. resolving inflammation
• 1 prototype per class, global relevance matrix, distance measure:
• leave-two-out validation (one from each class)
evaluation in terms of Receiver Operating Characteristics
AlCoB, June 2017, Aveiro / Portugal
false positive rate
truepositiveratetruepositiverate
diagonal Λii vs. cytokine index i
(A) established RA vs.
uninflamed control
(B) early RA vs.
resolving inflammation
Matrix Relevance LVQ
diagonal relevancesleave-one-out
AlCoB, June 2017, Aveiro / Portugal
CXCL4 chemokine (C-X-C motif) ligand 4
CXCL7 chemokine (C-X-C motif) ligand 7
direct study on protein level, staining / imaging of sinovial tissue:
macrophages : predominant source of CXCL4/7 expression
protein level studies
• high levels of CXCL4 and
CXLC7 in early RA
• expression on macrophages
outside of blood vessels
discriminates
early RA / resolving cases
AlCoB, June 2017, Aveiro / Portugal
false positive rate
truepositiveratetruepositiverate
diagonal Λii vs. cytokine index i
(A) established RA vs.
uninflamed control
(B) early RA vs.
resolving inflammation
relevant cytokines
macrophage
stimulating 1
diagonal relevancesleave-one-out
AlCoB, June 2017, Aveiro / Portugal
work in progress
• more samples (difficult...) needed in order
to obtain a reliable early diagnosis
• integrated analysis of gene expression and other data
from the same / an analogous patient cohort
Gargi Mukherjee … Rutgers University, New Jersey
Kevin Raines … Stanford University, California
Srikanth Sastry … JNC, Bengaluru, India
Sebastian Doniach … Stanford University, California
Gyan Bhanot … Rutgers University, New Jersey
Michael Biehl … University of Groningen, The Netherlands
In: Proc. IEEE Congress on Evolutionary Computation CEC 2016
32
Predicting Recurrence in Clear Cell
Renal Cell Carcinoma
Analysis of TCGA data using Outlier Analysis and GMLVQ
AlCoB, June 2017, Aveiro / Portugal
clear cell Renal Cell Carcinoma (ccRCC)
publicly available datasets:
The Cancer Genome Atlas (TCGA) cancergenome.nih.gov
also hosted at Broad Institute gdac.broadinstitute.org
data
AlCoB, June 2017, Aveiro / Portugal
data
20532genes
65normalsamples
469 tumor
samples
65 + 65
matched
clear cell renal cell carcinoma
TCGA data @ Broad Institute
mRNA-Seq expression data X
normalized, log-transformed:
Y=log(1+X)
65 normal samples
65 matched tumor samples
469 tumor samples in total
number of
recurrences
recurrence data:
days after diagnosis
AlCoB, June 2017, Aveiro / Portugal
380
training
samples
outlier analysis
89testsamples
randomized split
fast forward to
machine learning
analysis
AlCoB, June 2017, Aveiro / Portugal
380
training
samples
outlier analysis
per gene:
determine
mean μ, standard deviation σ of Y
for each gene: identify outlier samples
Y > μ + σ “high outlier“
Y < μ - σ “low outlier“
restrict the following analysis to genes with
≥ 20 high outlier samples
or ≥ 20 low outlier samples
AlCoB, June 2017, Aveiro / Portugal
1546 „high-outlier genes“
with KM log rank p < 0.001
1628 „low-outlier genes“
with KM log rank p < 0.0005
construct two binary outlier matrices
„1“ for high-outlier samples
„0“ else
„1“ for low-outlier samples
„0“ else
1546 genes
 PCA
Kaplan-Meier (KM) analysis per gene:
test for significant association of outlier status of samples with
recurrence
outlier analysis
1628 genes
380samples380samples
AlCoB, June 2017, Aveiro / Portugal
PCA reveals
four clusters of genes
711475
2261402
A B
DC
high outlier genes
low outlier genes
genes in small clusters (B,D):
outlier status associated
with late recurrence
genes in large clusters (A,C):
outlier status associated
with early recurrence
outlier analysis
AlCoB, June 2017, Aveiro / Portugal
recurrence risk score
top 20 genes (by KM p-value) from each cluster A,B,C,D
reference set of 80 genes
for each sample:
- determine outlier status w.r.t. the 80 genes (Y>?<μ ± σ )
- add up contributions per gene
- 1 if sample is outlier w.r.t. to a gene in A or C (early rec.)
0 if sample is not an outlier w.r.t. the gene
+ 1 if sample is outlier w.r.t. to a gene in B or D (late rec.)
recurrence risk score - 40 ≤ R ≤ + 40
observe: median = 2 over the 380 training samples
crisp classification w.r.t. recurrence risk:
high risk (early recurrence) if R < 2
low risk (late recurrence) if R ≥ 2
AlCoB, June 2017, Aveiro / Portugal
recurrence risk prediction
training set (380 samples) test set (89 samples)
log rank p < 1.e-16 log rank p < 1.e-4
KM plots with respect to high / low risk groups:
• risk score R is predictive of the actual recurrence risk
• the 80 selected genes can serve as a prognostic panel
AlCoB, June 2017, Aveiro / Portugal
extreme case analysis
number of
recurrences:
≤ 2 years
(early)
> 5 years
(late or no
recurrence)
109 samples
class 2, high risk
107 samples
class 1, low risk
(undefined)
2 classes:
• 80-dim. feature vectors
outlier analysis yields 4 groups (A,B,C,D) of 20
pre-selected genes associated with late/early recurrence
AlCoB, June 2017, Aveiro / Portugal
GMLVQ classifier
diagonal elements of Λ
A B C D
components of
A B C D
lowexpression|highexpression
• one prototype vector per class:
• adaptive distance for comparison of samples and prototypes:
AlCoB, June 2017, Aveiro / Portugal
GMLVQ classifier
ROC of GMLVQ classifier (Leave-One-Out of the 216 extreme samples)
KM plot w.r.t. all 469 samples
( L-1-O for 216 samples, plus 253 undefined
log rank p < 1.e-7
AlCoB, June 2017, Aveiro / Portugal
the set of 80 genes is also diagnostic:
• GMLVQ separates normal from tumor cells (close to) perfectly
• PCA of corresponding gene expressions:
65 normal samples
105 low risk samples (late rec.)
109 high risk samples (early rec.)
gradient from normal to high risk:
diagnostics?
AlCoB, June 2017, Aveiro / Portugal
12 most relevant genes
from GMLVQ classifier
most relevant genes (GMLVQ)
AlCoB, June 2017, Aveiro / Portugal
• GMLVQ suggests an even smaller panel of genes (12?)
identify a minimum panel for diagnostics and prognostics
• 80 genes do not necessarily reflect biological mechanisms
compare, e.g., with known pathways / modules of genes
remarks and open questions
• prospective studies
• more direct, multivariate identification of relevant genes by
dimension reduction + GMLVQ with back-transform
AlCoB, June 2017, Aveiro / Portugal 47
conclusion
prototype- and distance based systems:
- intuitive, transparent, interpretable
- classification, regression, unsupervised learning, visualization ...
- relevance learning: further insight into data and problem
- suitable for a variety of bio-medical problems
a recent review:
M. Biehl, B. Hammer, T. Villmann
Prototype-based models in Machine Learning
Advanced Review in: WIRES Cognitive Science 7(2): 92-111 (2016)
AlCoB, June 2017, Aveiro / Portugal 48
http://matlabserver.cs.rug.nl/gmlvqweb/web/
Matlab code:
Relevance and Matrix adaptation in Learning Vector
Quantization (GRLVQ, GMLVQ and LiRaM LVQ):
http://www.cs.rug.nl/~biehl/
links
Pre- and re-prints etc.:
A no-nonsense beginners’ tool for GMLVQ:
http://www.cs.rug.nl/~biehl/gmlvq
(see also: Tutorial, Thursday 9:30)
AlCoB, June 2017, Aveiro / Portugal 49
Barbara Hammer Thomas Villmann Wiebke Arlt Dagmar
Scheel-Toellner
Petra Schneider Kerstin Bunte Gyan Bhanot
thanks

More Related Content

Similar to June 2017: Biomedical applications of prototype-based classifiers and relevance learning

Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosisShorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosisdanieltm33
 
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures:  a new tool to facilitate cancer diagnosisShorter Multimarker signatures:  a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosisdanieltm33
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructureJeremy Besnard
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례mothersafe
 
HEART DISEASE PREDICTION USING MACHINE LEARNING TECHNIQUES
HEART DISEASE PREDICTION USING MACHINE LEARNING TECHNIQUESHEART DISEASE PREDICTION USING MACHINE LEARNING TECHNIQUES
HEART DISEASE PREDICTION USING MACHINE LEARNING TECHNIQUESIRJET Journal
 
12918 2015 article_144 (1)
12918 2015 article_144 (1)12918 2015 article_144 (1)
12918 2015 article_144 (1)Anandsingh06
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Keesthehyve
 
Top Cited Articles in Signal & Image Processing 2021-2022
Top Cited Articles in Signal & Image Processing 2021-2022Top Cited Articles in Signal & Image Processing 2021-2022
Top Cited Articles in Signal & Image Processing 2021-2022sipij
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streamsirjes
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative StudyIJMER
 
tadejko2007.pdf
tadejko2007.pdftadejko2007.pdf
tadejko2007.pdfMhartono
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesDmitry Grapov
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...University of Groningen
 
2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain DataUniversity of Groningen
 
mHealth Israel_Connecting time-dots for Outcomes Prediction in Healthcare Big...
mHealth Israel_Connecting time-dots for Outcomes Prediction in Healthcare Big...mHealth Israel_Connecting time-dots for Outcomes Prediction in Healthcare Big...
mHealth Israel_Connecting time-dots for Outcomes Prediction in Healthcare Big...Levi Shapiro
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
 
Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Enrico Glaab
 
Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...IJECEIAES
 
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...IJECEIAES
 

Similar to June 2017: Biomedical applications of prototype-based classifiers and relevance learning (20)

Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosisShorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis
 
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures:  a new tool to facilitate cancer diagnosisShorter Multimarker signatures:  a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosis
 
Prediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical StructurePrediction Of Bioactivity From Chemical Structure
Prediction Of Bioactivity From Chemical Structure
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
HEART DISEASE PREDICTION USING MACHINE LEARNING TECHNIQUES
HEART DISEASE PREDICTION USING MACHINE LEARNING TECHNIQUESHEART DISEASE PREDICTION USING MACHINE LEARNING TECHNIQUES
HEART DISEASE PREDICTION USING MACHINE LEARNING TECHNIQUES
 
12918 2015 article_144 (1)
12918 2015 article_144 (1)12918 2015 article_144 (1)
12918 2015 article_144 (1)
 
Metabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie KeesMetabolomics Society meeting 2011 - presentatie Kees
Metabolomics Society meeting 2011 - presentatie Kees
 
Top Cited Articles in Signal & Image Processing 2021-2022
Top Cited Articles in Signal & Image Processing 2021-2022Top Cited Articles in Signal & Image Processing 2021-2022
Top Cited Articles in Signal & Image Processing 2021-2022
 
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data StreamsNovel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streams
 
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection:  Comparative StudyA Threshold Fuzzy Entropy Based Feature Selection:  Comparative Study
A Threshold Fuzzy Entropy Based Feature Selection: Comparative Study
 
tadejko2007.pdf
tadejko2007.pdftadejko2007.pdf
tadejko2007.pdf
 
Case Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization StrategiesCase Study: Overview of Metabolomic Data Normalization Strategies
Case Study: Overview of Metabolomic Data Normalization Strategies
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...
 
2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data
 
Optimization 1
Optimization 1Optimization 1
Optimization 1
 
mHealth Israel_Connecting time-dots for Outcomes Prediction in Healthcare Big...
mHealth Israel_Connecting time-dots for Outcomes Prediction in Healthcare Big...mHealth Israel_Connecting time-dots for Outcomes Prediction in Healthcare Big...
mHealth Israel_Connecting time-dots for Outcomes Prediction in Healthcare Big...
 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
 
Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)Exploiting technical replicate variance in omics data analysis (RepExplore)
Exploiting technical replicate variance in omics data analysis (RepExplore)
 
Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...Multivariate sample similarity measure for feature selection with a resemblan...
Multivariate sample similarity measure for feature selection with a resemblan...
 
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
An Influence of Measurement Scale of Predictor Variable on Logistic Regressio...
 

More from University of Groningen

Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024University of Groningen
 
Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...University of Groningen
 
The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...University of Groningen
 
Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)University of Groningen
 
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...University of Groningen
 
2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... 2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... University of Groningen
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learningUniversity of Groningen
 
The statistical physics of learning - revisited
The statistical physics of learning - revisitedThe statistical physics of learning - revisited
The statistical physics of learning - revisitedUniversity of Groningen
 
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...University of Groningen
 
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell CarcinomaUniversity of Groningen
 
2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learning2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learningUniversity of Groningen
 
January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning  January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning University of Groningen
 

More from University of Groningen (17)

Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024
 
ESE-Eyes-2023.pdf
ESE-Eyes-2023.pdfESE-Eyes-2023.pdf
ESE-Eyes-2023.pdf
 
APPIS-FDGPET.pdf
APPIS-FDGPET.pdfAPPIS-FDGPET.pdf
APPIS-FDGPET.pdf
 
stat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdfstat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdf
 
prototypes-AMALEA.pdf
prototypes-AMALEA.pdfprototypes-AMALEA.pdf
prototypes-AMALEA.pdf
 
stat-phys-AMALEA.pdf
stat-phys-AMALEA.pdfstat-phys-AMALEA.pdf
stat-phys-AMALEA.pdf
 
Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...
 
The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...
 
Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)
 
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
 
2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... 2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ...
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
The statistical physics of learning - revisited
The statistical physics of learning - revisitedThe statistical physics of learning - revisited
The statistical physics of learning - revisited
 
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
 
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
 
2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learning2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learning
 
January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning  January 2020: Prototype-based systems in machine learning
January 2020: Prototype-based systems in machine learning
 

Recently uploaded

Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptxkhadijarafiq2012
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 

Recently uploaded (20)

Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Types of different blotting techniques.pptx
Types of different blotting techniques.pptxTypes of different blotting techniques.pptx
Types of different blotting techniques.pptx
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 

June 2017: Biomedical applications of prototype-based classifiers and relevance learning

  • 1. Michael Biehl Intelligent Systems Johann Bernoulli Institute for Mathematics and Computing Science University of Groningen / NL Biomedical applications of prototype-based classifiers and relevance learning www.cs.rug.nl/~biehl Introduction: prototype-based classification, relevance learning Generalized Matrix Relevance LVQ Illustration: three bio-medical applications
  • 2. AlCoB, June 2017, Aveiro / Portugal 2 supervised learning classification / regression / prediction based on labeled example data generic workflow: example data model apply to novel datatraining working obvious performance measures: overall / class-wise accuracy ROC, Precision Recall ... validation estimate working performance set parameters of model / training compare different models accuracy is not enough - interpretable “white-box” systems example: prototype-based models, distance-based classifiers
  • 3. AlCoB, June 2017, Aveiro / Portugal distance-based classifiers a simple distance-based system: (K) NN classifier • store a set of labeled examples • classify a query according to the label of the Nearest Neighbor (or the majority of K NN) • piece-wise linear decision boundaries according to (e.g.) Euclidean distance from all examples ? N-dim. feature space + conceptually simple, + no training phase + only one parameter (K) - expensive (storage, computation) - sensitive to mislabeled data - overly complex decision boundaries
  • 4. AlCoB, June 2017, Aveiro / Portugal prototype-based classification • represent the data by one or several prototypes per class • classify a query according to the label of the nearest prototype (or alternative schemes) • local decision boundaries acc. to (e.g.) Euclidean distances + + robust, low storage needs, little computational effort - model selection: number of prototypes per class, etc. requires training: placement of prototypes in feature space N-dim. feature space ? parameterization in feature space, interpretability Learning Vector Quantization [Kohonen, 1990]
  • 5. AlCoB, June 2017, Aveiro / Portugal ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors • initialize prototype vectors for different classes competitive learning: LVQ1 [Kohonen, 1990] • identify the winner (closest prototype) • present a single example • move the winner - closer towards the data (same class) - away from the data (different class)
  • 6. AlCoB, June 2017, Aveiro / Portugal ∙ identification of prototype vectors from labeled example data ∙ distance based classification (e.g. Euclidean) Learning Vector Quantization N-dimensional data, feature vectors ∙ tesselation of feature space [piece-wise linear] ∙ distance-based classification [here: Euclidean distances] ∙ generalization ability correct classification of new data ∙ aim: discrimination of classes ( ≠ vector quantization or density estimation )
  • 7. AlCoB, June 2017, Aveiro / Portugal cost function based LVQ one example: Generalized LVQ (GLVQ) cost function [Sato&Yamada, 1995] two winning prototypes: minimize E favors - small number of misclassifications, e.g. with - large margins between classes - small , large - class-typical prototypes
  • 8. AlCoB, June 2017, Aveiro / Portugal LVQ distance measures ? key question: appropriate distance / (dis-) similarity measure fixed, pre-defined distance measures: (G)LVQ can formulated for general (differentiable) distances examples: Minkowski distances (p≠2), correlation based, statistical divergences, ... not necessarily metrics! standard work-flow - consider several distance measures according to prior knowledge - compare performances in, e.g., cross-validation elegant approach: Relevance Learning / adaptive distances - employ parameterized distance measure - optimize in the data-driven training process (cost function!)
  • 9. AlCoB, June 2017, Aveiro / Portugal Generalized Matrix Relevance LVQ: generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009] GMLVQ
  • 10. AlCoB, June 2017, Aveiro / Portugal GMLVQ generalized quadratic distance in LVQ: [Schneider, Biehl, Hammer, 2009] variants: one global, several local, class-wise relevance matrices rectangular low-dim. representation / visualization [Bunte et al., 2012] diagonal matrices: single feature weights [Hammer et al., 2002] training: adaptation of prototypes and distance measure guided by GLVQ cost function Generalized Matrix Relevance LVQ:
  • 11. AlCoB, June 2017, Aveiro / Portugal 11 interpretation summarizes • the contribution of a single dimension • the relevance of original features in the classifier Note: interpretation assumes implicitly that features have equal order of magnitude e.g. after z-score-transformation → (averages over data set) quantifies the contribution of the pair of features (i,j) to the distance after training: prototypes represent typical class properties or subtypes Relevance Matrix
  • 12. AlCoB, June 2017, Aveiro / Portugal 12 three application examples I) steroid metabolomics: - detection of malignancy in adrenocortical tumors based on urinary steroid metabolite excretion GMLVQ: ~ 150 samples, 32-dim. feature vectors II) cytokine expression data: - diagnosis of (early) rheumatoid arthritis based on synovial tissue samples ~ 50 samples represented by 117 cytokine expressions in synovial tissue, PCA+GMLVQ combined III) gene expression data: - recurrence risk prediction from tumor samples ~ 400 samples, ~20000 dim. feature space outlier analysis + GMLVQ on (80) pre-selected genes
  • 13. Steroid metabolomics: detecting malignancy in adrenocortical tumors www.ensat.org W. Arlt, M. Biehl, A. Taylor, S. Hahner, R. Libé, B. Hughes, P. Schneider, D. Smith, H. Stiekema, N. Krone, E. Porfiri, G. Opocher, J. Bertherat, F. Mantero, B. Allolio, M. Terzolo, P. Nightingale, C. Shackleton, X. Bertagna, M.Fassnacht, P. Stewart Urine Steroid Metabolomics as a Biomarker Tool for Detecting Malignancy in Patients with Adrenal Tumors J Clinical Endocrinology & Metabolism 96: 3775-3784 (2011)
  • 14. AlCoB, June 2017, Aveiro / Portugal www.ensat.org classification of adrenocortical tumors (adenoma vs. carcinoma) based on steroid hormone excretion profiles benign ACA malignant ACC features: 32 steroid metabolite excretion values non-invasive measurement (24 hrs. urine samples) steroid metabolomics aim: develop a novel biomarker tool for differential diagnosis idea: identify characteristic steroid profiles (prototypes)
  • 15. AlCoB, June 2017, Aveiro / Portugal Generalized Matrix LVQ , ACC vs. ACA classification ∙ data divided in 90% training, 10% test set, (z-score transformed) ∙ determine prototypes typical profiles (1 per class) ∙ apply classifier to test data evaluate performance (error rates, ROC) ∙ adaptive generalized quadratic distance measure parameterized by ∙ repeat and average over many random splits [Arlt et al., 2011] [Biehl et al., 2012] steroid metabolomics
  • 16. AlCoB, June 2017, Aveiro / Portugal prototypes: steroid excretion in ACA/ACC ACA ACC steroid metabolomics
  • 17. AlCoB, June 2017, Aveiro / Portugal subset of selected steroids ↔ technical realization (patented, UoB) using 9 markers only, similar ROC Relevance matrix … of pairs of markersrelevance of single markers frequency of markers to be among top 9 steroid metabolomics
  • 18. AlCoB, June 2017, Aveiro / Portugal ROC characteristics clear improvement due to adaptive distances 90% / 10% randomized splits of the data in training and test set averages over 1000 runs (1-specificity) (sensitivity) diagonal rel. Euclidean full matrix AUC 0.87 0.93 0.97 steroid metabolomics
  • 19. AlCoB, June 2017, Aveiro / Portugal off-diagonaldiagonal elements 19 ACA ACC discriminative e.g. steroid 19 (THS) Relevance matrix steroid metabolomics
  • 20. AlCoB, June 2017, Aveiro / Portugal highly discriminative combination of markers! weaklydiscriminativemarkers 5a-THA (8) TH-Doc (12) steroid metabolomics
  • 21. AlCoB, June 2017, Aveiro / Portugal (1-specificity) (sensitivity) 8 GMLVQ GRLVQ diagonal rel. Euclidean full matrix AUC 0.87 0.93 0.97 adrenocortical tumors
  • 22. AlCoB, June 2017, Aveiro / Portugal visualization of the data set ACA ACC generic property: relevance matrix becomes highly singular
  • 23. AlCoB, June 2017, Aveiro / Portugal • monitoring of patients after surgery and/or under medication aim: recurrence detection / prediction work in progress • high-throughput LC/MS assay to replace GC/MS • other disorders affecting / related to steroid metabolism • identification of tumor subtypes ? • on-going prospective study w.r.t. ~ 2000 patients
  • 24. Early diagnosis of Rheumatoid Arthritis Expression of chemokines CXCL4 and CXCL7 by synovial macrophages defines an early stage of rheumatoid arthritis Annals of the Rheumatic Diseases 75:763-771 (2016) L. Yeo, N. Adlard, M. Biehl, M. Juarez, M. Snow C.D. Buckley, A. Filer, K. Raza, D. Scheel-Toellner
  • 25. AlCoB, June 2017, Aveiro / Portugal uninflamed control established RA early inflammation resolving early RA cytokine based diagnosis of RA at earliest possible stage ? ultimate goals: understand pathogenesis and mechanism of progression rheumatoid arthritis (RA)
  • 26. AlCoB, June 2017, Aveiro / Portugal mRNA extraction real-time PCRtissue sectionsynovium synovial tissue cytokine expression IL1A IL17F FASL CXCL4 CCL15 TGFB1 KITLG IL1B IL18 CD70 CXCL5 CCL16 TGFB2 MST1 IL1RN IL19 CD30L CXCL6 CCL17 TGFB3 SPP1 IL2 IL20 4-1BB-L CXCL7 CCL18 EGF SFRP1 IL3 IL21 TRAIL CXCL9 CCL19 FGF2 ANXA1 IL4 IL22 RANKL CXCL10 CCL20 TGFA TNFRSF13B IL5 IL23A TWEAK CXCL11 CCL21 IGF2 IL6R IL6 IL24 APRIL CXCL12 CCL22 VEGFA NAMPT IL7 IL25 BAFF CXCL13 CCL23 VEGFB C1QTNF3 IL8 IL26 LIGHT CXCL14 CCL24 MIF VCAM1 IL9 IL27 TL1A CXCL16 CCL25 LIF LGALS1 IL10 IL28A GITRL CCL1 CCL26 OSM LGALS9 IL11 IL29 FASLG CCL2 CCL27 ADIPOQ LGALS3 IL12A IL32 IFNA1 CCL3 CCL28 LEP LGALS12 IL12B IL33 IFNA2 CCL4 XCL1 GHRL IL13 LTA IFNB1 CCL5 XCL2 RETN IL14 TNF IFNG CCL7 CX3CL1 CTLA4 IL15 LTB CXCL1 CCL8 CSF1 EPO IL16 OX40L CXCL2 CCL11 CSF2 TPO IL17A CD40L CXCL3 CCL13 CSF3 FLT3LG panel of 117 cytokines • cell signaling proteins • regulate immune response • produced by, e.g. T-cells, macrophages, lymphocytes, fibroblasts, etc.
  • 27. AlCoB, June 2017, Aveiro / Portugal GMLVQ analysis pre-processing: • log-transformed expression values • 21 leading principal components explain 95% of the variation Two two-class problems: (A) established RA vs. uninflamed controls (B) early RA vs. resolving inflammation • 1 prototype per class, global relevance matrix, distance measure: • leave-two-out validation (one from each class) evaluation in terms of Receiver Operating Characteristics
  • 28. AlCoB, June 2017, Aveiro / Portugal false positive rate truepositiveratetruepositiverate diagonal Λii vs. cytokine index i (A) established RA vs. uninflamed control (B) early RA vs. resolving inflammation Matrix Relevance LVQ diagonal relevancesleave-one-out
  • 29. AlCoB, June 2017, Aveiro / Portugal CXCL4 chemokine (C-X-C motif) ligand 4 CXCL7 chemokine (C-X-C motif) ligand 7 direct study on protein level, staining / imaging of sinovial tissue: macrophages : predominant source of CXCL4/7 expression protein level studies • high levels of CXCL4 and CXLC7 in early RA • expression on macrophages outside of blood vessels discriminates early RA / resolving cases
  • 30. AlCoB, June 2017, Aveiro / Portugal false positive rate truepositiveratetruepositiverate diagonal Λii vs. cytokine index i (A) established RA vs. uninflamed control (B) early RA vs. resolving inflammation relevant cytokines macrophage stimulating 1 diagonal relevancesleave-one-out
  • 31. AlCoB, June 2017, Aveiro / Portugal work in progress • more samples (difficult...) needed in order to obtain a reliable early diagnosis • integrated analysis of gene expression and other data from the same / an analogous patient cohort
  • 32. Gargi Mukherjee … Rutgers University, New Jersey Kevin Raines … Stanford University, California Srikanth Sastry … JNC, Bengaluru, India Sebastian Doniach … Stanford University, California Gyan Bhanot … Rutgers University, New Jersey Michael Biehl … University of Groningen, The Netherlands In: Proc. IEEE Congress on Evolutionary Computation CEC 2016 32 Predicting Recurrence in Clear Cell Renal Cell Carcinoma Analysis of TCGA data using Outlier Analysis and GMLVQ
  • 33. AlCoB, June 2017, Aveiro / Portugal clear cell Renal Cell Carcinoma (ccRCC) publicly available datasets: The Cancer Genome Atlas (TCGA) cancergenome.nih.gov also hosted at Broad Institute gdac.broadinstitute.org data
  • 34. AlCoB, June 2017, Aveiro / Portugal data 20532genes 65normalsamples 469 tumor samples 65 + 65 matched clear cell renal cell carcinoma TCGA data @ Broad Institute mRNA-Seq expression data X normalized, log-transformed: Y=log(1+X) 65 normal samples 65 matched tumor samples 469 tumor samples in total number of recurrences recurrence data: days after diagnosis
  • 35. AlCoB, June 2017, Aveiro / Portugal 380 training samples outlier analysis 89testsamples randomized split fast forward to machine learning analysis
  • 36. AlCoB, June 2017, Aveiro / Portugal 380 training samples outlier analysis per gene: determine mean μ, standard deviation σ of Y for each gene: identify outlier samples Y > μ + σ “high outlier“ Y < μ - σ “low outlier“ restrict the following analysis to genes with ≥ 20 high outlier samples or ≥ 20 low outlier samples
  • 37. AlCoB, June 2017, Aveiro / Portugal 1546 „high-outlier genes“ with KM log rank p < 0.001 1628 „low-outlier genes“ with KM log rank p < 0.0005 construct two binary outlier matrices „1“ for high-outlier samples „0“ else „1“ for low-outlier samples „0“ else 1546 genes  PCA Kaplan-Meier (KM) analysis per gene: test for significant association of outlier status of samples with recurrence outlier analysis 1628 genes 380samples380samples
  • 38. AlCoB, June 2017, Aveiro / Portugal PCA reveals four clusters of genes 711475 2261402 A B DC high outlier genes low outlier genes genes in small clusters (B,D): outlier status associated with late recurrence genes in large clusters (A,C): outlier status associated with early recurrence outlier analysis
  • 39. AlCoB, June 2017, Aveiro / Portugal recurrence risk score top 20 genes (by KM p-value) from each cluster A,B,C,D reference set of 80 genes for each sample: - determine outlier status w.r.t. the 80 genes (Y>?<μ ± σ ) - add up contributions per gene - 1 if sample is outlier w.r.t. to a gene in A or C (early rec.) 0 if sample is not an outlier w.r.t. the gene + 1 if sample is outlier w.r.t. to a gene in B or D (late rec.) recurrence risk score - 40 ≤ R ≤ + 40 observe: median = 2 over the 380 training samples crisp classification w.r.t. recurrence risk: high risk (early recurrence) if R < 2 low risk (late recurrence) if R ≥ 2
  • 40. AlCoB, June 2017, Aveiro / Portugal recurrence risk prediction training set (380 samples) test set (89 samples) log rank p < 1.e-16 log rank p < 1.e-4 KM plots with respect to high / low risk groups: • risk score R is predictive of the actual recurrence risk • the 80 selected genes can serve as a prognostic panel
  • 41. AlCoB, June 2017, Aveiro / Portugal extreme case analysis number of recurrences: ≤ 2 years (early) > 5 years (late or no recurrence) 109 samples class 2, high risk 107 samples class 1, low risk (undefined) 2 classes: • 80-dim. feature vectors outlier analysis yields 4 groups (A,B,C,D) of 20 pre-selected genes associated with late/early recurrence
  • 42. AlCoB, June 2017, Aveiro / Portugal GMLVQ classifier diagonal elements of Λ A B C D components of A B C D lowexpression|highexpression • one prototype vector per class: • adaptive distance for comparison of samples and prototypes:
  • 43. AlCoB, June 2017, Aveiro / Portugal GMLVQ classifier ROC of GMLVQ classifier (Leave-One-Out of the 216 extreme samples) KM plot w.r.t. all 469 samples ( L-1-O for 216 samples, plus 253 undefined log rank p < 1.e-7
  • 44. AlCoB, June 2017, Aveiro / Portugal the set of 80 genes is also diagnostic: • GMLVQ separates normal from tumor cells (close to) perfectly • PCA of corresponding gene expressions: 65 normal samples 105 low risk samples (late rec.) 109 high risk samples (early rec.) gradient from normal to high risk: diagnostics?
  • 45. AlCoB, June 2017, Aveiro / Portugal 12 most relevant genes from GMLVQ classifier most relevant genes (GMLVQ)
  • 46. AlCoB, June 2017, Aveiro / Portugal • GMLVQ suggests an even smaller panel of genes (12?) identify a minimum panel for diagnostics and prognostics • 80 genes do not necessarily reflect biological mechanisms compare, e.g., with known pathways / modules of genes remarks and open questions • prospective studies • more direct, multivariate identification of relevant genes by dimension reduction + GMLVQ with back-transform
  • 47. AlCoB, June 2017, Aveiro / Portugal 47 conclusion prototype- and distance based systems: - intuitive, transparent, interpretable - classification, regression, unsupervised learning, visualization ... - relevance learning: further insight into data and problem - suitable for a variety of bio-medical problems a recent review: M. Biehl, B. Hammer, T. Villmann Prototype-based models in Machine Learning Advanced Review in: WIRES Cognitive Science 7(2): 92-111 (2016)
  • 48. AlCoB, June 2017, Aveiro / Portugal 48 http://matlabserver.cs.rug.nl/gmlvqweb/web/ Matlab code: Relevance and Matrix adaptation in Learning Vector Quantization (GRLVQ, GMLVQ and LiRaM LVQ): http://www.cs.rug.nl/~biehl/ links Pre- and re-prints etc.: A no-nonsense beginners’ tool for GMLVQ: http://www.cs.rug.nl/~biehl/gmlvq (see also: Tutorial, Thursday 9:30)
  • 49. AlCoB, June 2017, Aveiro / Portugal 49 Barbara Hammer Thomas Villmann Wiebke Arlt Dagmar Scheel-Toellner Petra Schneider Kerstin Bunte Gyan Bhanot thanks