SlideShare a Scribd company logo
1 of 25
Interpretable machine learning in endocrinology
•tumor classification •diagnosis of primary aldosteronism
Michael Biehl
www.cs.rug.nl/~biehl
Bernoulli Ins6tute for Mathema+cs,
Computer Science and Ar+ficial Intelligence
University of Groningen, The Netherlands
Centre for Systems Modelling &
Quan6ta6ve Biomedicine
University of Birmingham, UK
January 2024
1
Steroid metabolomics
- classifica+on of adrenocor6cal tumors
- differen+al diagnosis of Primary Aldosteronism
- other applica+ons
Summary / Remarks
overview
Generalized Matrix Relevance Learning Vector Quan6za6on
- interpretable, prototype- & distance-based classifica+on
- relevance learning
The mysterious learning machine
© C.M. Gerigk, E.L. van den Brandhof
shallow systems
small data
• training: represent data by one or
several prototypes per class
• working: classify a query according to
the label of the nearest prototype (± bias)
• decision boundaries according
to (Euclidean) distances
+ parameterized in feature space,
intuitive and interpretable
one intuitive, interpretable framework:
prototype-based systems for distance-based classification
Learning Vector Quan1za1on (LVQ)
N-dim. feature space
?
x1
x2
2
distance measure compares
prototypes
data points
<latexit sha1_base64="UJ0CnxsmOYhqIceWvmtM1a1jzmc=">AAACAnicbZDLSsNAFIZP6q3WW7xsxM1gESpISUTRZcGNywr2Am0ok+mkHZxcmJmoJQQ3voobF4q49Snc+TZO0gra+sPAx3/OYc753YgzqSzryyjMzS8sLhWXSyura+sb5uZWU4axILRBQh6Ktosl5SygDcUUp+1IUOy7nLbcm4us3rqlQrIwuFajiDo+HgTMYwQrbfXM3X6l62M1dL3kLj36wfv0EPXMslW1cqFZsCdQru14ueo987PbD0ns00ARjqXs2FaknAQLxQinaakbSxphcoMHtKMxwD6VTpKfkKID7fSRFwr9AoVy9/dEgn0pR76rO7Md5XQtM/+rdWLlnTsJC6JY0YCMP/JijlSIsjxQnwlKFB9pwEQwvSsiQywwUTq1kg7Bnj55FprHVfu0al3pNE5grCLswT5UwIYzqMEl1KEBBB7gCV7g1Xg0no03433cWjAmM9vwR8bHN5hXmcs=</latexit>
d(w, x)
<latexit sha1_base64="BO0hjh0jjGtOWhN24FvEx5iLYjA=">AAACG3icbVDLSgMxFM3UV62v+ti5CRahQikzRdGNUHDjqlSxD+jUIZNm2tBMZkgyljLMf7jxV9y4UMSV4MK/Me10oa0HAodzziX3HjdkVCrT/DYyS8srq2vZ9dzG5tb2Tn53rymDSGDSwAELRNtFkjDKSUNRxUg7FAT5LiMtd3g18VsPREga8Ds1DknXR31OPYqR0pKTr9g+UgPXi0fJZXHkWKWRUynZrBcoCUdO7QRCm3KYhtz4NrmvQSdfMMvmFHCRWDNSqB54U9Sd/KfdC3DkE64wQ1J2LDNU3RgJRTEjSc6OJAkRHqI+6WjKkU9kN57elsBjrfSgFwj9uIJT9fdEjHwpx76rk5Ml5bw3Ef/zOpHyLrox5WGkCMfpR17EoArgpCjYo4JgxcaaICyo3hXiARIIK11nTpdgzZ+8SJqVsnVWNm90G6cgRRYcgiNQBBY4B1VwDeqgATB4BM/gFbwZT8aL8W58pNGMMZvZB39gfP0AhJqitA==</latexit>
w = (w1, w2, . . . wN ) 2 RN
<latexit sha1_base64="wnXGk4kPKNgyQFFIyuWASwkJ91Q=">AAACG3icbVDLSgMxFM3UV62v+ti5CRahQikzRdGNUHDjqlSxD+jUIZNm2tBMZkgy0jLMf7jxV9y4UMSV4MK/Me10oa0HAodzziX3HjdkVCrT/DYyS8srq2vZ9dzG5tb2Tn53rymDSGDSwAELRNtFkjDKSUNRxUg7FAT5LiMtd3g18VsPREga8Ds1DknXR31OPYqR0pKTr9g+UgPXi0fJZXHkWKWRUynZrBcoCUdO7QRCm3KYhtz4NrmvQSdfMMvmFHCRWDNSqB54U9Sd/KfdC3DkE64wQ1J2LDNU3RgJRTEjSc6OJAkRHqI+6WjKkU9kN57elsBjrfSgFwj9uIJT9fdEjHwpx76rk5Ml5bw3Ef/zOpHyLrox5WGkCMfpR17EoArgpCjYo4JgxcaaICyo3hXiARIIK11nTpdgzZ+8SJqVsnVWNm90G6cgRRYcgiNQBBY4B1VwDeqgATB4BM/gFbwZT8aL8W58pNGMMZvZB39gfP0Aiy+iuA==</latexit>
x = (x1, x2, . . . xN ) 2 RN
generalized measure
<latexit sha1_base64="M8aOBBg5FRYaJ1fyRUO0LwMElEk=">AAACWXicbZHLSsNAFIYn8VbjrepShEERKmhJRNGFhYIbFyIKVoWmhslk0k47uTBzopaQp/MNXAjig7hx4fQieDsw8POfby7nHz8VXIFtvxrmxOTU9Exp1pqbX1hcKi+vXKskk5Q1aCISeesTxQSPWQM4CHabSkYiX7Abv3cy6N/cM6l4El9BP2WtiLRjHnJKQFteOQ283D3TfEAKXHEjAh0/zB+KnS/5WGzjGnZVFnk53+nWnOLuHFuuYCFUHjyOd/Gjx13J2x3YxuOTNNkt8BfTHTLdMeOVN+2qPSz8VzhjsVk/7tGn9Xf/wis/u0FCs4jFQAVRqunYKbRyIoFTwQrLzRRLCe2RNmtqGZOIqVY+TKbAW9oJcJhIvWLAQ/f7jpxESvUjX5ODedXv3sD8r9fMIDxq5TxOM2AxHV0UZgJDggcx44BLRkH0tSBUcv1WTDtEEgr6MywdgvN75L/ieq/qHFTtS53GPhpVCa2hDVRBDjpEdXSKLlADUfSCPowpY9p4Mw2zZFoj1DTGe1bRjzJXPwHRhbZD</latexit>
d⇤(w, x) =
N
X
i,j=1
(wi xi) ⇤ij (wj xj)
<latexit sha1_base64="M8aOBBg5FRYaJ1fyRUO0LwMElEk=">AAACWXicbZHLSsNAFIYn8VbjrepShEERKmhJRNGFhYIbFyIKVoWmhslk0k47uTBzopaQp/MNXAjig7hx4fQieDsw8POfby7nHz8VXIFtvxrmxOTU9Exp1pqbX1hcKi+vXKskk5Q1aCISeesTxQSPWQM4CHabSkYiX7Abv3cy6N/cM6l4El9BP2WtiLRjHnJKQFteOQ283D3TfEAKXHEjAh0/zB+KnS/5WGzjGnZVFnk53+nWnOLuHFuuYCFUHjyOd/Gjx13J2x3YxuOTNNkt8BfTHTLdMeOVN+2qPSz8VzhjsVk/7tGn9Xf/wis/u0FCs4jFQAVRqunYKbRyIoFTwQrLzRRLCe2RNmtqGZOIqVY+TKbAW9oJcJhIvWLAQ/f7jpxESvUjX5ODedXv3sD8r9fMIDxq5TxOM2AxHV0UZgJDggcx44BLRkH0tSBUcv1WTDtEEgr6MywdgvN75L/ieq/qHFTtS53GPhpVCa2hDVRBDjpEdXSKLlADUfSCPowpY9p4Mw2zZFoj1DTGe1bRjzJXPwHRhbZD</latexit>
d⇤(w, x) =
N
X
i,j=1
(wi xi) ⇤ij (wj xj)
relevance of a particular single feature
⇤ij contribution of a pair of features
<latexit sha1_base64="VJrqC2f3AvxRnQoN+W87xJJoQE0=">AAAB9HicbVDLSsNAFL2pr1hftS7dDC2Cq5KIosuCLly4qGAf0IYymUzaoZNJnJkUSuh3uHGhiC79A//AlTv/xuljoa0HBg7nnMu9c/yEM6Ud59vKrayurW/Ym/mt7Z3dvcJ+saHiVBJaJzGPZcvHinImaF0zzWkrkRRHPqdNf3A58ZtDKhWLxZ0eJdSLcE+wkBGsjeShzo3JBribMTbuFspOxZkCLRN3TspV+/OjePVWqnULX50gJmlEhSYcK9V2nUR7GZaaEU7H+U6qaILJAPdo21CBI6q8bHr0GB0ZJUBhLM0TGk3V3xMZjpQaRb5JRlj31aI3Ef/z2qkOL7yMiSTVVJDZojDlSMdo0gAKmKRE85EhmEhmbkWkjyUm2vSUNyW4i19eJo2TintWcW5NG6cwgw2HUIJjcOEcqnANNagDgXt4gCd4tobWo/Vivc6iOWs+cwB/YL3/AGGzlMM=</latexit>
⇤ii
training: optimize prototypes and relevance matrix
w.r.t. performance on training data (objective function )
<latexit sha1_base64="lpCZtwYFbfUJa1g2C28efHF7fM4=">AAACEHicbVDJSgNBEO2JW4xb1KOXwSBGkDDjgl6EgCCClwhmgcwk9HR6kmZ6FrprlDDM3YsXf8WLB0W8evQm+DF2loMmPih4vFdFVT0n4kyCYXxpmZnZufmF7GJuaXlldS2/vlGTYSwIrZKQh6LhYEk5C2gVGHDaiATFvsNp3fHOB379lgrJwuAG+hG1fdwNmMsIBiW187sWpy5YieVj6Dlucpe2kqK3l1qCdXtgpe3EOzPT1lU7XzBKxhD6NDHHpFDeP7z4tsl9pZ3/tDohiX0aAOFYyqZpRGAnWAAjnKY5K5Y0wsTDXdpUNMA+lXYyfCjVd5TS0d1QqApAH6q/JxLsS9n3HdU5uFtOegPxP68Zg3tqJyyIYqABGS1yY65DqA/S0TtMUAK8rwgmgqlbddLDAhNQGeZUCObky9OkdlAyj0vGtUrjCI2QRVtoGxWRiU5QGV2iCqoigh7QE3pBr9qj9qy9ae+j1ow2ntlEf6B9/ACozKCF</latexit>
n
w(k)
oK
k=1
<latexit sha1_base64="RRVKfYYSCrLGkfVifJ7Vd2/itY0=">AAAB7nicbVDLSgMxFL1TX7W+qi7dBIvgqsyIojsLbly4qGAf0A4lk8m0oZlMSDJCGfoRbgQr4tZf8Dfc+Tdm2i609cCFwznnch+B5Ewb1/12Ciura+sbxc3S1vbO7l55/6Cpk1QR2iAJT1Q7wJpyJmjDMMNpWyqK44DTVjC8yf3WI1WaJeLBjCT1Y9wXLGIEGyu1unc2GuJeueJW3SnQMvHmpHL9+ZJjUu+Vv7phQtKYCkM41rrjudL4GVaGEU7HpW6qqcRkiPu0Y6nAMdV+Nl13jE6sEqIoUbaEQVP1d0eGY61HcWCTMTYDvejl4n9eJzXRlZ8xIVNDBZkNilKOTILy21HIFCWGjyzBRDG7KyIDrDAx9kMl+wRv8eRl0jyrehdV996t1M5hhiIcwTGcggeXUINbqEMDCAzhCSbw6kjn2Xlz3mfRgjPvOYQ/cD5+AClck94=</latexit>
⇤
Generalized Matrix Relevance LVQ (GMLVQ)
3
Pregnenolone
17Preg
Progesterone
17OHP
Cholesterol
Deoxycortico-
sterone
11-
Deoxycortisol
Cortisol
18OH-Cortico-
sterone
Aldosterone
Cortico-
sterone
DHEA Androstenedione Testosterone DHT
HSD3B2
HSD3B2 HSD17B3 SRD5A2
HSD3B2
CYP21A2
CYP21A2
CYP17A1
CYP17A1
CYP17A1
CYP17A1
CYP11A1
CYP11B1
CYP11B2 CYP11B2 CYP11B2
Cortisone
HSD11B1
HSD11B2
Mineralocor*coids
Mineralocor*coid
precursors
Glucocor*coids
Androgens
Androgen precursors
Glucocor*coid
precursors
CYP11B1
steroidogenesis
adrenal
gland
www.ensat.org
possibly: +
hybrid steroids
(GC / MC)
urinary steroid metabolomics (USM)
gas chromatography-
mass spectrometry
(GC-MS)
Healthy Controls
USM for tumor classifica1on
adrenocortical tumors (adenoma vs. carcinoma)
benign ACA malignant ACC
features: e.g. 32 steroid metabolite excretion values
non-invasive measurement (24 hrs. urine)
steroid
#
set of
labelled
example
data
aim: develop a tool / support system for differential diagnosis
idea: analyse retrospective data by machine learning
identify characteristic steroid prototypes and relevances
www.ensat.org
6
2009
Generalized Matrix LVQ, ACC vs. ACA classification
o pre-processing: log-transformation of excretion values
• data split into 90% training, 10% validation set
• training: determine prototypes and relevance matrix
representative profiles (1 per class)
parameterizes distance measure
• validation: apply classifier to 10% hold-out data
evaluate expected performance (error rates, ROC, … )
<latexit sha1_base64="j1pbJxGcclpE9qX0bmbEOr9stb0=">AAACDXicbVC7TsMwFHV4lvIKMLJYFCSmKmkRMFawMDAURB9SEyrHcVqrjhPZDlIV5QdY+BUWBhBiZWfjb3DaDNBypCsdnXOv7r3HixmVyrK+jYXFpeWV1dJaeX1jc2vb3NltyygRmLRwxCLR9ZAkjHLSUlQx0o0FQaHHSMcbXeZ+54EISSN+p8YxcUM04DSgGCkt9c1D51o3+wg6lEMnRGroeeltdp/Wa46iIZGwXsv6ZsWqWhPAeWIXpAIKNPvml+NHOAkJV5ghKXu2FSs3RUJRzEhWdhJJYoRHaEB6mnKkF7np5JsMHmnFh0EkdHEFJ+rviRSFUo5DT3fm98pZLxf/83qJCs7dlPI4UYTj6aIgYVBFMI8G+lQQrNhYE4QF1bdCPEQCYaUDLOsQ7NmX50m7VrVPq/bNSaVxUcRRAvvgABwDG5yBBrgCTdACGDyCZ/AK3own48V4Nz6mrQtGMbMH/sD4/AEoO5r7</latexit>
⇤ 2 R32⇥32
o repeat and average results over many random splits
tumor classifica1on
7
ROC characteristics
clear improvement due to
relevance learning
on average over 1000
randomized splits
1-specificity
sensitivity
diagonal rel.
Euclidean
full matrix
AUC
0.87
0.93
0.97
valida1on set performance
no relevances
only diagonal
full
insights beyond accuracy ?
<latexit sha1_base64="qexDmh1X9b8ENldBKeOpdrm1QiY=">AAACDXicbVC7TsMwFHV4lvIKMLJYFCSmKmlBMFawMDAURB9SEyrHcVqrjhPZDlIV5QdY+BUWBhBiZWfjb3DaDNBypCsdnXOv7r3HixmVyrK+jYXFpeWV1dJaeX1jc2vb3NltyygRmLRwxCLR9ZAkjHLSUlQx0o0FQaHHSMcbXeZ+54EISSN+p8YxcUM04DSgGCkt9c1D51o3+wg6lEMnRGroeeltdp/Wa46iIZGwXsv6ZsWqWhPAeWIXpAIKNPvml+NHOAkJV5ghKXu2FSs3RUJRzEhWdhJJYoRHaEB6mnKkF7np5JsMHmnFh0EkdHEFJ+rviRSFUo5DT3fm98pZLxf/83qJCs7dlPI4UYTj6aIgYVBFMI8G+lQQrNhYE4QF1bdCPEQCYaUDLOsQ7NmX50m7VrVPq9bNSaVxUcRRAvvgABwDG5yBBrgCTdACGDyCZ/AK3own48V4Nz6mrQtGMbMH/sD4/AEnlZr5</latexit>
⇤ 2 R32⇥32
<latexit sha1_base64="KaC1img8je94lP+te55UhwFbUIc=">AAAB83icbVBNS8NAFHypX7V+VT16WSyCp5KIoseiFw8eKlhbaELZbLbt0s0m7L4IJfRvePGgiFf/jDf/jds2B20dWBhm5vHeTphKYdB1v53Syura+kZ5s7K1vbO7V90/eDRJphlvsUQmuhNSw6VQvIUCJe+kmtM4lLwdjm6mfvuJayMS9YDjlAcxHSjRF4yilXz/zkYj2suFmPSqNbfuzkCWiVeQGhRo9qpffpSwLOYKmaTGdD03xSCnGgWTfFLxM8NTykZ0wLuWKhpzE+SzmyfkxCoR6SfaPoVkpv6eyGlszDgObTKmODSL3lT8z+tm2L8KcqHSDLli80X9TBJMyLQAEgnNGcqxJZRpYW8lbEg1ZWhrqtgSvMUvL5PHs7p3UXfvz2uN66KOMhzBMZyCB5fQgFtoQgsYpPAMr/DmZM6L8+58zKMlp5g5hD9wPn8AHVSRvA==</latexit>
⇤ii
8
… pairs of markers
(detailed inspection)
importance of single markers
insights: relevance matrix
5-PT 5-PD
THS
facilitates selection of reduced panels
with similar performance
relevances
9
ACA
ACC
relevances confirm – surprise – visualize
19 THS
individually
discriminative
10
relevances
(8) 5⍺ THA (12) TH-Doc
?
confirm - surprise - visualize
ACC
ACA
GMLVQ: multivariate analysis
discrimina+ve combina+on:
11
ACA
ACC
relevance matrix is dominated by leading eigenvectors
confirm – surprise - visualize
• visualize data set
and prototypes
q misclassifications?
• inspect individual cases
o uncertain cases
v outliers
12
excellent performance of USM + machine learning
suggest triple test strategy with excellent sensitivity and specificity
currently working on practical implementation in clinical practice
et al.
prospective
et al.
13
prospec1ve study
2020
14
primary aldosteronism (PA)
15
primary aldosteronism (PA)
PA - causes 5-10% of hypertension cases
- most frequent form of secondary hypertension
- increased risk for cardio- and cerebrovascular complica+ons
PA subtypes main treatment:
UPA (unilateral PA), one adrenal gland affected
by aldosterone producing adenoma (APA) surgery
several driver muta+ons in the tumor are known
BPA (bilateral PA) with both adrenal glands mineralocor6coid
over-producing, most frequently due to hyperplasia antagonists
16
pa1ent cohort muta1ons (UPA)
17
Heathy Control vs. all PA
near perfect
discrimina+on
of primary
aldosteronism
and controls
relevances
18
PA related classifica1on problems
(similar results for RF)
19
KCNJ5 vs. all other PA
very good
discrimina+on
of KCNJ5 type vs.
non-KCNJ5 PA
relevances
20
main findings
- all PA vs. HC: excellent separa+on, characterized by increased excre+on
of mineralocortoid and glucocor+coid precursors
- all UPA vs. all BPA: subop+mal discrimina+on
- KCNJ5 vs. non-KCNJ5: very good discrimina+on (key: hybrid steroid 18-oxo-THF)
poten+al added value: KCNJ5-posi+ve cases are always unilateral
avoid invasive test (adrenal vein sampling)
improved therapy selec+on, KCNK5-posi+ve cases
respond beWer to treatment
21
ongoing & future work on PA
more detailed relevance analysis
Iterated Relevance Matrix Analysis (IRMA)
S.S. Lövdal and M. Biehl, Proc. ESANN 2023
journal manuscript under review
improved classifiers
UPA vs. BPA, other subtypes of PA (?)
mul+-class problem wrt muta+ons (more data needed)
LC-MS instead of GC-MS
faster cheaper assessment of the steroid metabolome
also in other applica+ons of USM + machine learning
22
other applica1on examples of steroid metabolomics
...
IEEE Members News, March 2021
exploit domain knowledge
(c) https://twitter.com/jessenleon
Let the data speak for itself
when the data cleans itself - unknown
24
open access, 2023, 290 pages
University of Groningen Press
m.biehl@rug.nl
www.cs.rug.nl/~biehl

More Related Content

Similar to Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024

Botnets behavioral patterns in the network. A Machine Learning study of botne...
Botnets behavioral patterns in the network. A Machine Learning study of botne...Botnets behavioral patterns in the network. A Machine Learning study of botne...
Botnets behavioral patterns in the network. A Machine Learning study of botne...
Czech Technical University in Prague
 

Similar to Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024 (19)

Too good to be true? How validate your data
Too good to be true? How validate your dataToo good to be true? How validate your data
Too good to be true? How validate your data
 
Machine Learning for Molecules
Machine Learning for MoleculesMachine Learning for Molecules
Machine Learning for Molecules
 
第12回 配信講義 計算科学技術特論A(2021)
第12回 配信講義 計算科学技術特論A(2021)第12回 配信講義 計算科学技術特論A(2021)
第12回 配信講義 計算科学技術特論A(2021)
 
Pycon
PyconPycon
Pycon
 
DeepMoD: Deep Learning Model discovery
DeepMoD: Deep Learning Model discoveryDeepMoD: Deep Learning Model discovery
DeepMoD: Deep Learning Model discovery
 
Farid Ali Presentation_Final.pptx
Farid Ali Presentation_Final.pptxFarid Ali Presentation_Final.pptx
Farid Ali Presentation_Final.pptx
 
Formal modeling and SMT-based parameterized verification of Data-Aware BPMN
Formal modeling and SMT-based parameterized verification of Data-Aware BPMNFormal modeling and SMT-based parameterized verification of Data-Aware BPMN
Formal modeling and SMT-based parameterized verification of Data-Aware BPMN
 
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining AlgorithmIRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
IRJET-A Novel Approaches for Motif Discovery using Data Mining Algorithm
 
Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)Interpretable machine-learning (in endocrinology and beyond)
Interpretable machine-learning (in endocrinology and beyond)
 
Detecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine LearningDetecting Malicious Websites using Machine Learning
Detecting Malicious Websites using Machine Learning
 
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
IRJET- Extending Association Rule Summarization Techniques to Assess Risk of ...
 
Medical science
Medical scienceMedical science
Medical science
 
Botnets behavioral patterns in the network. A Machine Learning study of botne...
Botnets behavioral patterns in the network. A Machine Learning study of botne...Botnets behavioral patterns in the network. A Machine Learning study of botne...
Botnets behavioral patterns in the network. A Machine Learning study of botne...
 
3D culture in phenotypic screening : advantages, process changes and new tech...
3D culture in phenotypic screening : advantages, process changes and new tech...3D culture in phenotypic screening : advantages, process changes and new tech...
3D culture in phenotypic screening : advantages, process changes and new tech...
 
A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...A meta-analysis of computational biology benchmarks reveals predictors of pro...
A meta-analysis of computational biology benchmarks reveals predictors of pro...
 
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
Optimized Design of Broadly Detecting qPCR Primers and Probes Using a Conserv...
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Materials Informatics and Python
Materials Informatics and PythonMaterials Informatics and Python
Materials Informatics and Python
 
IRJET-Improvement and Enhancement in Emergency Medical Services using IOT
IRJET-Improvement and Enhancement in Emergency Medical Services using IOTIRJET-Improvement and Enhancement in Emergency Medical Services using IOT
IRJET-Improvement and Enhancement in Emergency Medical Services using IOT
 

More from University of Groningen

More from University of Groningen (20)

APPIS-FDGPET.pdf
APPIS-FDGPET.pdfAPPIS-FDGPET.pdf
APPIS-FDGPET.pdf
 
stat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdfstat-phys-appis-reduced.pdf
stat-phys-appis-reduced.pdf
 
prototypes-AMALEA.pdf
prototypes-AMALEA.pdfprototypes-AMALEA.pdf
prototypes-AMALEA.pdf
 
stat-phys-AMALEA.pdf
stat-phys-AMALEA.pdfstat-phys-AMALEA.pdf
stat-phys-AMALEA.pdf
 
Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...Evidence for tissue and stage-specific composition of the ribosome: machine l...
Evidence for tissue and stage-specific composition of the ribosome: machine l...
 
The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...The statistical physics of learning revisted: Phase transitions in layered ne...
The statistical physics of learning revisted: Phase transitions in layered ne...
 
Biehl hanze-2021
Biehl hanze-2021Biehl hanze-2021
Biehl hanze-2021
 
2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...2020: Prototype-based classifiers and relevance learning: medical application...
2020: Prototype-based classifiers and relevance learning: medical application...
 
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
2020: Phase transitions in layered neural networks: ReLU vs. sigmoidal activa...
 
2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ... 2020: So you thought the ribosome was constant and conserved ...
2020: So you thought the ribosome was constant and conserved ...
 
Prototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciencesPrototype-based classifiers and their applications in the life sciences
Prototype-based classifiers and their applications in the life sciences
 
Prototype-based models in machine learning
Prototype-based models in machine learningPrototype-based models in machine learning
Prototype-based models in machine learning
 
The statistical physics of learning - revisited
The statistical physics of learning - revisitedThe statistical physics of learning - revisited
The statistical physics of learning - revisited
 
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...2013: Sometimes you can trust a rat - The sbv improver species translation ch...
2013: Sometimes you can trust a rat - The sbv improver species translation ch...
 
2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification2013: Prototype-based learning and adaptive distances for classification
2013: Prototype-based learning and adaptive distances for classification
 
2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...2015: Distance based classifiers: Basic concepts, recent developments and app...
2015: Distance based classifiers: Basic concepts, recent developments and app...
 
2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data2016: Classification of FDG-PET Brain Data
2016: Classification of FDG-PET Brain Data
 
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
2016: Predicting Recurrence in Clear Cell Renal Cell Carcinoma
 
2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learning2017: Prototype-based models in unsupervised and supervised machine learning
2017: Prototype-based models in unsupervised and supervised machine learning
 
June 2017: Biomedical applications of prototype-based classifiers and relevan...
June 2017: Biomedical applications of prototype-based classifiers and relevan...June 2017: Biomedical applications of prototype-based classifiers and relevan...
June 2017: Biomedical applications of prototype-based classifiers and relevan...
 

Recently uploaded

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cherry
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
Cherry
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
Cherry
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
COMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCOMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demerits
Cherry
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Cherry
 

Recently uploaded (20)

Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
GBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of AsepsisGBSN - Microbiology (Unit 4) Concept of Asepsis
GBSN - Microbiology (Unit 4) Concept of Asepsis
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Understanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution MethodsUnderstanding Partial Differential Equations: Types and Solution Methods
Understanding Partial Differential Equations: Types and Solution Methods
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
COMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demeritsCOMPOSTING : types of compost, merits and demerits
COMPOSTING : types of compost, merits and demerits
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 

Interpretable machine learning in endocrinology, M. Biehl, APPIS 2024

  • 1. Interpretable machine learning in endocrinology •tumor classification •diagnosis of primary aldosteronism Michael Biehl www.cs.rug.nl/~biehl Bernoulli Ins6tute for Mathema+cs, Computer Science and Ar+ficial Intelligence University of Groningen, The Netherlands Centre for Systems Modelling & Quan6ta6ve Biomedicine University of Birmingham, UK January 2024
  • 2. 1 Steroid metabolomics - classifica+on of adrenocor6cal tumors - differen+al diagnosis of Primary Aldosteronism - other applica+ons Summary / Remarks overview Generalized Matrix Relevance Learning Vector Quan6za6on - interpretable, prototype- & distance-based classifica+on - relevance learning The mysterious learning machine © C.M. Gerigk, E.L. van den Brandhof shallow systems small data
  • 3. • training: represent data by one or several prototypes per class • working: classify a query according to the label of the nearest prototype (± bias) • decision boundaries according to (Euclidean) distances + parameterized in feature space, intuitive and interpretable one intuitive, interpretable framework: prototype-based systems for distance-based classification Learning Vector Quan1za1on (LVQ) N-dim. feature space ? x1 x2 2
  • 4. distance measure compares prototypes data points <latexit sha1_base64="UJ0CnxsmOYhqIceWvmtM1a1jzmc=">AAACAnicbZDLSsNAFIZP6q3WW7xsxM1gESpISUTRZcGNywr2Am0ok+mkHZxcmJmoJQQ3voobF4q49Snc+TZO0gra+sPAx3/OYc753YgzqSzryyjMzS8sLhWXSyura+sb5uZWU4axILRBQh6Ktosl5SygDcUUp+1IUOy7nLbcm4us3rqlQrIwuFajiDo+HgTMYwQrbfXM3X6l62M1dL3kLj36wfv0EPXMslW1cqFZsCdQru14ueo987PbD0ns00ARjqXs2FaknAQLxQinaakbSxphcoMHtKMxwD6VTpKfkKID7fSRFwr9AoVy9/dEgn0pR76rO7Md5XQtM/+rdWLlnTsJC6JY0YCMP/JijlSIsjxQnwlKFB9pwEQwvSsiQywwUTq1kg7Bnj55FprHVfu0al3pNE5grCLswT5UwIYzqMEl1KEBBB7gCV7g1Xg0no03433cWjAmM9vwR8bHN5hXmcs=</latexit> d(w, x) <latexit sha1_base64="BO0hjh0jjGtOWhN24FvEx5iLYjA=">AAACG3icbVDLSgMxFM3UV62v+ti5CRahQikzRdGNUHDjqlSxD+jUIZNm2tBMZkgyljLMf7jxV9y4UMSV4MK/Me10oa0HAodzziX3HjdkVCrT/DYyS8srq2vZ9dzG5tb2Tn53rymDSGDSwAELRNtFkjDKSUNRxUg7FAT5LiMtd3g18VsPREga8Ds1DknXR31OPYqR0pKTr9g+UgPXi0fJZXHkWKWRUynZrBcoCUdO7QRCm3KYhtz4NrmvQSdfMMvmFHCRWDNSqB54U9Sd/KfdC3DkE64wQ1J2LDNU3RgJRTEjSc6OJAkRHqI+6WjKkU9kN57elsBjrfSgFwj9uIJT9fdEjHwpx76rk5Ml5bw3Ef/zOpHyLrox5WGkCMfpR17EoArgpCjYo4JgxcaaICyo3hXiARIIK11nTpdgzZ+8SJqVsnVWNm90G6cgRRYcgiNQBBY4B1VwDeqgATB4BM/gFbwZT8aL8W58pNGMMZvZB39gfP0AhJqitA==</latexit> w = (w1, w2, . . . wN ) 2 RN <latexit sha1_base64="wnXGk4kPKNgyQFFIyuWASwkJ91Q=">AAACG3icbVDLSgMxFM3UV62v+ti5CRahQikzRdGNUHDjqlSxD+jUIZNm2tBMZkgy0jLMf7jxV9y4UMSV4MK/Me10oa0HAodzziX3HjdkVCrT/DYyS8srq2vZ9dzG5tb2Tn53rymDSGDSwAELRNtFkjDKSUNRxUg7FAT5LiMtd3g18VsPREga8Ds1DknXR31OPYqR0pKTr9g+UgPXi0fJZXHkWKWRUynZrBcoCUdO7QRCm3KYhtz4NrmvQSdfMMvmFHCRWDNSqB54U9Sd/KfdC3DkE64wQ1J2LDNU3RgJRTEjSc6OJAkRHqI+6WjKkU9kN57elsBjrfSgFwj9uIJT9fdEjHwpx76rk5Ml5bw3Ef/zOpHyLrox5WGkCMfpR17EoArgpCjYo4JgxcaaICyo3hXiARIIK11nTpdgzZ+8SJqVsnVWNm90G6cgRRYcgiNQBBY4B1VwDeqgATB4BM/gFbwZT8aL8W58pNGMMZvZB39gfP0Aiy+iuA==</latexit> x = (x1, x2, . . . xN ) 2 RN generalized measure <latexit sha1_base64="M8aOBBg5FRYaJ1fyRUO0LwMElEk=">AAACWXicbZHLSsNAFIYn8VbjrepShEERKmhJRNGFhYIbFyIKVoWmhslk0k47uTBzopaQp/MNXAjig7hx4fQieDsw8POfby7nHz8VXIFtvxrmxOTU9Exp1pqbX1hcKi+vXKskk5Q1aCISeesTxQSPWQM4CHabSkYiX7Abv3cy6N/cM6l4El9BP2WtiLRjHnJKQFteOQ283D3TfEAKXHEjAh0/zB+KnS/5WGzjGnZVFnk53+nWnOLuHFuuYCFUHjyOd/Gjx13J2x3YxuOTNNkt8BfTHTLdMeOVN+2qPSz8VzhjsVk/7tGn9Xf/wis/u0FCs4jFQAVRqunYKbRyIoFTwQrLzRRLCe2RNmtqGZOIqVY+TKbAW9oJcJhIvWLAQ/f7jpxESvUjX5ODedXv3sD8r9fMIDxq5TxOM2AxHV0UZgJDggcx44BLRkH0tSBUcv1WTDtEEgr6MywdgvN75L/ieq/qHFTtS53GPhpVCa2hDVRBDjpEdXSKLlADUfSCPowpY9p4Mw2zZFoj1DTGe1bRjzJXPwHRhbZD</latexit> d⇤(w, x) = N X i,j=1 (wi xi) ⇤ij (wj xj) <latexit sha1_base64="M8aOBBg5FRYaJ1fyRUO0LwMElEk=">AAACWXicbZHLSsNAFIYn8VbjrepShEERKmhJRNGFhYIbFyIKVoWmhslk0k47uTBzopaQp/MNXAjig7hx4fQieDsw8POfby7nHz8VXIFtvxrmxOTU9Exp1pqbX1hcKi+vXKskk5Q1aCISeesTxQSPWQM4CHabSkYiX7Abv3cy6N/cM6l4El9BP2WtiLRjHnJKQFteOQ283D3TfEAKXHEjAh0/zB+KnS/5WGzjGnZVFnk53+nWnOLuHFuuYCFUHjyOd/Gjx13J2x3YxuOTNNkt8BfTHTLdMeOVN+2qPSz8VzhjsVk/7tGn9Xf/wis/u0FCs4jFQAVRqunYKbRyIoFTwQrLzRRLCe2RNmtqGZOIqVY+TKbAW9oJcJhIvWLAQ/f7jpxESvUjX5ODedXv3sD8r9fMIDxq5TxOM2AxHV0UZgJDggcx44BLRkH0tSBUcv1WTDtEEgr6MywdgvN75L/ieq/qHFTtS53GPhpVCa2hDVRBDjpEdXSKLlADUfSCPowpY9p4Mw2zZFoj1DTGe1bRjzJXPwHRhbZD</latexit> d⇤(w, x) = N X i,j=1 (wi xi) ⇤ij (wj xj) relevance of a particular single feature ⇤ij contribution of a pair of features <latexit sha1_base64="VJrqC2f3AvxRnQoN+W87xJJoQE0=">AAAB9HicbVDLSsNAFL2pr1hftS7dDC2Cq5KIosuCLly4qGAf0IYymUzaoZNJnJkUSuh3uHGhiC79A//AlTv/xuljoa0HBg7nnMu9c/yEM6Ud59vKrayurW/Ym/mt7Z3dvcJ+saHiVBJaJzGPZcvHinImaF0zzWkrkRRHPqdNf3A58ZtDKhWLxZ0eJdSLcE+wkBGsjeShzo3JBribMTbuFspOxZkCLRN3TspV+/OjePVWqnULX50gJmlEhSYcK9V2nUR7GZaaEU7H+U6qaILJAPdo21CBI6q8bHr0GB0ZJUBhLM0TGk3V3xMZjpQaRb5JRlj31aI3Ef/z2qkOL7yMiSTVVJDZojDlSMdo0gAKmKRE85EhmEhmbkWkjyUm2vSUNyW4i19eJo2TintWcW5NG6cwgw2HUIJjcOEcqnANNagDgXt4gCd4tobWo/Vivc6iOWs+cwB/YL3/AGGzlMM=</latexit> ⇤ii training: optimize prototypes and relevance matrix w.r.t. performance on training data (objective function ) <latexit sha1_base64="lpCZtwYFbfUJa1g2C28efHF7fM4=">AAACEHicbVDJSgNBEO2JW4xb1KOXwSBGkDDjgl6EgCCClwhmgcwk9HR6kmZ6FrprlDDM3YsXf8WLB0W8evQm+DF2loMmPih4vFdFVT0n4kyCYXxpmZnZufmF7GJuaXlldS2/vlGTYSwIrZKQh6LhYEk5C2gVGHDaiATFvsNp3fHOB379lgrJwuAG+hG1fdwNmMsIBiW187sWpy5YieVj6Dlucpe2kqK3l1qCdXtgpe3EOzPT1lU7XzBKxhD6NDHHpFDeP7z4tsl9pZ3/tDohiX0aAOFYyqZpRGAnWAAjnKY5K5Y0wsTDXdpUNMA+lXYyfCjVd5TS0d1QqApAH6q/JxLsS9n3HdU5uFtOegPxP68Zg3tqJyyIYqABGS1yY65DqA/S0TtMUAK8rwgmgqlbddLDAhNQGeZUCObky9OkdlAyj0vGtUrjCI2QRVtoGxWRiU5QGV2iCqoigh7QE3pBr9qj9qy9ae+j1ow2ntlEf6B9/ACozKCF</latexit> n w(k) oK k=1 <latexit sha1_base64="RRVKfYYSCrLGkfVifJ7Vd2/itY0=">AAAB7nicbVDLSgMxFL1TX7W+qi7dBIvgqsyIojsLbly4qGAf0A4lk8m0oZlMSDJCGfoRbgQr4tZf8Dfc+Tdm2i609cCFwznnch+B5Ewb1/12Ciura+sbxc3S1vbO7l55/6Cpk1QR2iAJT1Q7wJpyJmjDMMNpWyqK44DTVjC8yf3WI1WaJeLBjCT1Y9wXLGIEGyu1unc2GuJeueJW3SnQMvHmpHL9+ZJjUu+Vv7phQtKYCkM41rrjudL4GVaGEU7HpW6qqcRkiPu0Y6nAMdV+Nl13jE6sEqIoUbaEQVP1d0eGY61HcWCTMTYDvejl4n9eJzXRlZ8xIVNDBZkNilKOTILy21HIFCWGjyzBRDG7KyIDrDAx9kMl+wRv8eRl0jyrehdV996t1M5hhiIcwTGcggeXUINbqEMDCAzhCSbw6kjn2Xlz3mfRgjPvOYQ/cD5+AClck94=</latexit> ⇤ Generalized Matrix Relevance LVQ (GMLVQ) 3
  • 5. Pregnenolone 17Preg Progesterone 17OHP Cholesterol Deoxycortico- sterone 11- Deoxycortisol Cortisol 18OH-Cortico- sterone Aldosterone Cortico- sterone DHEA Androstenedione Testosterone DHT HSD3B2 HSD3B2 HSD17B3 SRD5A2 HSD3B2 CYP21A2 CYP21A2 CYP17A1 CYP17A1 CYP17A1 CYP17A1 CYP11A1 CYP11B1 CYP11B2 CYP11B2 CYP11B2 Cortisone HSD11B1 HSD11B2 Mineralocor*coids Mineralocor*coid precursors Glucocor*coids Androgens Androgen precursors Glucocor*coid precursors CYP11B1 steroidogenesis adrenal gland www.ensat.org possibly: + hybrid steroids (GC / MC)
  • 6. urinary steroid metabolomics (USM) gas chromatography- mass spectrometry (GC-MS) Healthy Controls
  • 7. USM for tumor classifica1on adrenocortical tumors (adenoma vs. carcinoma) benign ACA malignant ACC features: e.g. 32 steroid metabolite excretion values non-invasive measurement (24 hrs. urine) steroid # set of labelled example data aim: develop a tool / support system for differential diagnosis idea: analyse retrospective data by machine learning identify characteristic steroid prototypes and relevances www.ensat.org 6 2009
  • 8. Generalized Matrix LVQ, ACC vs. ACA classification o pre-processing: log-transformation of excretion values • data split into 90% training, 10% validation set • training: determine prototypes and relevance matrix representative profiles (1 per class) parameterizes distance measure • validation: apply classifier to 10% hold-out data evaluate expected performance (error rates, ROC, … ) <latexit sha1_base64="j1pbJxGcclpE9qX0bmbEOr9stb0=">AAACDXicbVC7TsMwFHV4lvIKMLJYFCSmKmkRMFawMDAURB9SEyrHcVqrjhPZDlIV5QdY+BUWBhBiZWfjb3DaDNBypCsdnXOv7r3HixmVyrK+jYXFpeWV1dJaeX1jc2vb3NltyygRmLRwxCLR9ZAkjHLSUlQx0o0FQaHHSMcbXeZ+54EISSN+p8YxcUM04DSgGCkt9c1D51o3+wg6lEMnRGroeeltdp/Wa46iIZGwXsv6ZsWqWhPAeWIXpAIKNPvml+NHOAkJV5ghKXu2FSs3RUJRzEhWdhJJYoRHaEB6mnKkF7np5JsMHmnFh0EkdHEFJ+rviRSFUo5DT3fm98pZLxf/83qJCs7dlPI4UYTj6aIgYVBFMI8G+lQQrNhYE4QF1bdCPEQCYaUDLOsQ7NmX50m7VrVPq/bNSaVxUcRRAvvgABwDG5yBBrgCTdACGDyCZ/AK3own48V4Nz6mrQtGMbMH/sD4/AEoO5r7</latexit> ⇤ 2 R32⇥32 o repeat and average results over many random splits tumor classifica1on 7
  • 9. ROC characteristics clear improvement due to relevance learning on average over 1000 randomized splits 1-specificity sensitivity diagonal rel. Euclidean full matrix AUC 0.87 0.93 0.97 valida1on set performance no relevances only diagonal full insights beyond accuracy ? <latexit sha1_base64="qexDmh1X9b8ENldBKeOpdrm1QiY=">AAACDXicbVC7TsMwFHV4lvIKMLJYFCSmKmlBMFawMDAURB9SEyrHcVqrjhPZDlIV5QdY+BUWBhBiZWfjb3DaDNBypCsdnXOv7r3HixmVyrK+jYXFpeWV1dJaeX1jc2vb3NltyygRmLRwxCLR9ZAkjHLSUlQx0o0FQaHHSMcbXeZ+54EISSN+p8YxcUM04DSgGCkt9c1D51o3+wg6lEMnRGroeeltdp/Wa46iIZGwXsv6ZsWqWhPAeWIXpAIKNPvml+NHOAkJV5ghKXu2FSs3RUJRzEhWdhJJYoRHaEB6mnKkF7np5JsMHmnFh0EkdHEFJ+rviRSFUo5DT3fm98pZLxf/83qJCs7dlPI4UYTj6aIgYVBFMI8G+lQQrNhYE4QF1bdCPEQCYaUDLOsQ7NmX50m7VrVPq9bNSaVxUcRRAvvgABwDG5yBBrgCTdACGDyCZ/AK3own48V4Nz6mrQtGMbMH/sD4/AEnlZr5</latexit> ⇤ 2 R32⇥32 <latexit sha1_base64="KaC1img8je94lP+te55UhwFbUIc=">AAAB83icbVBNS8NAFHypX7V+VT16WSyCp5KIoseiFw8eKlhbaELZbLbt0s0m7L4IJfRvePGgiFf/jDf/jds2B20dWBhm5vHeTphKYdB1v53Syura+kZ5s7K1vbO7V90/eDRJphlvsUQmuhNSw6VQvIUCJe+kmtM4lLwdjm6mfvuJayMS9YDjlAcxHSjRF4yilXz/zkYj2suFmPSqNbfuzkCWiVeQGhRo9qpffpSwLOYKmaTGdD03xSCnGgWTfFLxM8NTykZ0wLuWKhpzE+SzmyfkxCoR6SfaPoVkpv6eyGlszDgObTKmODSL3lT8z+tm2L8KcqHSDLli80X9TBJMyLQAEgnNGcqxJZRpYW8lbEg1ZWhrqtgSvMUvL5PHs7p3UXfvz2uN66KOMhzBMZyCB5fQgFtoQgsYpPAMr/DmZM6L8+58zKMlp5g5hD9wPn8AHVSRvA==</latexit> ⇤ii 8
  • 10. … pairs of markers (detailed inspection) importance of single markers insights: relevance matrix 5-PT 5-PD THS facilitates selection of reduced panels with similar performance relevances 9
  • 11. ACA ACC relevances confirm – surprise – visualize 19 THS individually discriminative 10
  • 12. relevances (8) 5⍺ THA (12) TH-Doc ? confirm - surprise - visualize ACC ACA GMLVQ: multivariate analysis discrimina+ve combina+on: 11
  • 13. ACA ACC relevance matrix is dominated by leading eigenvectors confirm – surprise - visualize • visualize data set and prototypes q misclassifications? • inspect individual cases o uncertain cases v outliers 12
  • 14. excellent performance of USM + machine learning suggest triple test strategy with excellent sensitivity and specificity currently working on practical implementation in clinical practice et al. prospective et al. 13 prospec1ve study 2020
  • 16. 15 primary aldosteronism (PA) PA - causes 5-10% of hypertension cases - most frequent form of secondary hypertension - increased risk for cardio- and cerebrovascular complica+ons PA subtypes main treatment: UPA (unilateral PA), one adrenal gland affected by aldosterone producing adenoma (APA) surgery several driver muta+ons in the tumor are known BPA (bilateral PA) with both adrenal glands mineralocor6coid over-producing, most frequently due to hyperplasia antagonists
  • 18. 17 Heathy Control vs. all PA near perfect discrimina+on of primary aldosteronism and controls relevances
  • 19. 18 PA related classifica1on problems (similar results for RF)
  • 20. 19 KCNJ5 vs. all other PA very good discrimina+on of KCNJ5 type vs. non-KCNJ5 PA relevances
  • 21. 20 main findings - all PA vs. HC: excellent separa+on, characterized by increased excre+on of mineralocortoid and glucocor+coid precursors - all UPA vs. all BPA: subop+mal discrimina+on - KCNJ5 vs. non-KCNJ5: very good discrimina+on (key: hybrid steroid 18-oxo-THF) poten+al added value: KCNJ5-posi+ve cases are always unilateral avoid invasive test (adrenal vein sampling) improved therapy selec+on, KCNK5-posi+ve cases respond beWer to treatment
  • 22. 21 ongoing & future work on PA more detailed relevance analysis Iterated Relevance Matrix Analysis (IRMA) S.S. Lövdal and M. Biehl, Proc. ESANN 2023 journal manuscript under review improved classifiers UPA vs. BPA, other subtypes of PA (?) mul+-class problem wrt muta+ons (more data needed) LC-MS instead of GC-MS faster cheaper assessment of the steroid metabolome also in other applica+ons of USM + machine learning
  • 23. 22 other applica1on examples of steroid metabolomics ...
  • 24. IEEE Members News, March 2021 exploit domain knowledge (c) https://twitter.com/jessenleon Let the data speak for itself when the data cleans itself - unknown
  • 25. 24 open access, 2023, 290 pages University of Groningen Press m.biehl@rug.nl www.cs.rug.nl/~biehl