Advanced machine learning for metabolite identification
1. Machine Learning for Metabolite
Identification from Mass spectrometry
Hai Dai Nguyen
Bioinformatics Center, Institute for Chemical Research
Kyoto University
10/08/2019 D.
H.
Nguyen,
Kyoto
University 1
2. 1. Introduction
2. Problems, challenges and motivations
3. Proposed methods
4. Experiments
5. Conclusions
Outline
10/08/2019 D.
H.
Nguyen,
Kyoto
University 2
3. Tandem mass spectrometry (MS/MS)
§ Devices fragment compounds or molecules into fragments and
estimate frequencies of captured ones, each corresponds to a peak
§ There exist peak interactions
(co-occurrence of peaks)
Introduction
10/08/2019 D.
H.
Nguyen,
Kyoto
University 3
Peak interaction
4. Metabolite identification
§ Given a query spectrum,find molecules in database with similar spectra
Introduction: approaches for metabolite id.
10/08/2019 D.
H.
Nguyen,
Kyoto
University 4
Spectra database
Query
Spectra
In silico spectra Structure database
Query
Spectra
StructuresSpectra Fingerprints
spectra
Query
Spectra library In silico fragmentation Machine Learning
5. Fingerprint based two-step methods for metabolite identification.
1) Given a set of spectra and corresponding fingerprints, learn a model
to predict fingerprints -> Learning step
Introduction: machine learning approach
10/08/2019 D.
H.
Nguyen,
Kyoto
University 5
2) Given query spectrum, use its predicted fingerprints to retrieve
molecules from molecular database -> Candidate Retrieval step
spectra
Predictor
fingerprints structures
Software
1) Learning step to predict fingerprints
Predictor
fingerprints structures
Software
Query
spectra
2) Candidate Retrieval step to rank candidates
6. Existing methods
10/08/2019 D.
H.
Nguyen,
Kyoto
University 6
FingerID (Bioinformatics, 2012)
q Kernel method
• Define probability product kernel (PPK) for spectra.
• Then, use SVM for classification.
q Drawback
§ Peak interactions are ignored.
𝑝" =
1
𝑛"
& 𝑝"(𝑘)
*+
,-.
𝑝/ =
1
𝑛/
& 𝑝/(𝑘)
*0
,-.
𝐾 𝑋, 𝑌 =
1
𝑛" 𝑛/
& 𝑝"(𝑖)𝑝/(𝑗)
7,8
7. Existing methods
10/08/2019 D.
H.
Nguyen,
Kyoto
University 7
CSI:FingerID (Bioinformatics, 2014)
qImproved version of FingerID
• Define kernel for spectra by PPK
• Kernels for fragmentation trees are defined and
combined with PPK via MKL.
• Then, use SVM for classification.
8. Existing methods
10/08/2019 D.
H.
Nguyen,
Kyoto
University 8
CSI:FingerID (Bioinformatics, 2014)
Fragmentation trees
§ Models of fragmentation of a molecule in MS/MS
§ Nodes ~ peaks ~ molecular formula of fragments.
§ Edges ~ losses ~ uncaptured uncharged fragments.
§ Trees can be predicted from spectra and provide
structural information of spectra.
9. Existing methods
10/08/2019 D.
H.
Nguyen,
Kyoto
University 9
CSI:FingerID (Bioinformatics, 2014)
Pros & Cos
§ Improved accuracy due to additional
structural information provided by trees
§ Computationally expensive due to conversion
of trees from spectra
§ Lack of interpretation
10. Motivation (1)
10/08/2019 D.
H.
Nguyen,
Kyoto
University 10
To develop a learning model, which
§ is able to incorporate peak interactions into learning
model
§ is efficiently computable in prediction
§ is more interpretable
11. • Idea: introducing interaction term to model (two-way
interaction model)
• Prediction model:
𝑓 𝑥 = 𝑏 + 𝑤>
𝑥 + 𝑥>
𝑊𝑥
,
𝑦 𝑥 = 𝑠𝑔𝑛(𝑓(𝑥))
• Objective function :
min
G,H,I
&[1 − 𝑦7 𝑓(𝑥7)]M
*
7-.
+ 𝛼 𝑤 . + 𝛽 𝑊 ∗
• Convexity guarantees to find globally optimal
solution.
Hinge loss Sparsity Low-rank
Main effect Interactions
8/10/19 D.
H.
Nguyen,
Kyoto
University 11
SIMPLE: sparse interaction model for MS
12. • Idea: introducing background knowledge (interaction from trees)
to regularize W
• Laplacian regularization on features
ü 𝑥>
𝑊𝑥 = ∑ 𝑤78 𝑥7 𝑥87,8 = ∑ (𝑣7
>
𝑣8)𝑥7 𝑥87,8
ü 𝑊 can be decomposed as 𝑉>
𝑉
ü
𝑅 𝑉 = ∑ 𝐴78 𝑣7 − 𝑣8
V
7,8
= trace 𝑊𝐿 , where L is laplacian matrix
§ New objective function :
min
G,H,I
∑ [1 − 𝑦7 𝑓(𝑥7)]M
*
7-. + 𝛼 𝑤 . + 𝛽 𝑊 ∗ + 𝛾
trace(𝑊𝐿)
§ Still convex
8/10/19 D.
H.
Nguyen,
Kyoto
University 12
L-SIMPLE: sparse interaction model for MS
+
13. q Objective function with all non-
smooth terms (1)
q Alternating direction method of
multipliers (ADMM)
üIntroducing C to turn (1) to
constrained problem (2)
üDefining augmented
Lagrange function (3)
üIterating between primal
and dual variables (4)
(1)
(2)
(3)
(4)
8/10/19 D.
H.
Nguyen,
Kyoto
University 13
L-SIMPLE: sparse interaction model for MS
Optimization
14. Dataset
üMassBank (402 spectra + corresponding molecular
structures)
üMass range: [1, 963]
üFingerprints of 528 bits, converted by OpenBabel.
Settings
üFeature dim: 500 by dividing mass range into bins.
üHyperparameters 𝛼, 𝛽, 𝛾 are chosen by 5-fold CV.
üEvaluation: Accuracy & F1 score.
Experiments: Dataset and settings
10/08/2019 D.
H.
Nguyen,
Kyoto
University 14
16. Experiments: Results
10/08/2019 D.
H.
Nguyen,
Kyoto
University 16
Case studies to illustrate the effects of peak interactions: 𝑤. 𝑥. + 𝑤V 𝑥V+𝑊.V 𝑥. 𝑥V
(1)
17. Experiments: Results
10/08/2019 D.
H.
Nguyen,
Kyoto
University 17
Case studies to illustrate the effects of peak interactions: 𝑤. 𝑥. + 𝑤V 𝑥V+𝑊.V 𝑥. 𝑥V
(2)
18. Case studies to illustrate the effects of peak interactions: 𝑤. 𝑥. + 𝑤V 𝑥V+𝑊.V 𝑥. 𝑥V
* These confirms the importance of peak interactions for fingerprint prediction.
8/10/19 D.
H.
Nguyen,
Kyoto
University 18
(3)
Experiments: Results
19. • Focused
on
fingerprint
prediction
(first
step)
for
metabolite
identification.
• Proposed
sparse
interaction
model
(SIMPLE)
is
able
to
incorporate
peak
interactions
and
more
interpretable,
faster in
prediction.
8/10/19 D.
H.
Nguyen,
Kyoto
University 19
Summary
20. § Fingerprint based two-step methods for metabolite identification.
Recap: machine learning approach
10/08/2019 D.
H.
Nguyen,
Kyoto
University 20
Drawbacks of software-based fingerprints:
1) Not task-specific-> limited predictive performance
2) Large in size -> slow prediction
spectra
Predictor
1) Learning step to predict fingerprints
fingerprints structures
Software
2) Candidate Retrieval step to rank candidates
Predictor
fingerprints structures
Software
Query
spectra
21. § Learning a model to generate representations (Molecular
Vectors) for metabolites from molecular structures with
following advantages:
1) Adaptive to Metabolite Identification task
Molecular vectors are specific to task and data, and therefore non-
redundant, possibly leading to better predictive performance
2) Shorter representations for metabolites
Molecular vectors are not necessarily binary and might contain
information relevant to the task, leading to their much smaller size
Motivation (2)
10/08/2019 D.
H.
Nguyen,
Kyoto
University 21
22. § Fingerprint based two-step methods for metabolite identification.
Motivation (2)
10/08/2019 D.
H.
Nguyen,
Kyoto
University 22
spectra
Predictor
Software based fingerprints
fingerprints structures
Software
23. § Learning molecular vectors for metabolite identification.
Motivation (2)
10/08/2019 D.
H.
Nguyen,
Kyoto
University 23
spectra
Predictor
LEARNING 𝜙 to GENERATE molecular vectors
molecular vectors structures
𝜙
Challenge: Learning 𝜙 is hard as
no supervised information is
available L
Main idea: Learning 𝜙 to
maximize correlation between
spectra and structures J
24. Has two steps: Learning and Candidate Retrieval
1) Learning
There are two subtasks of learning:
Subtask #1: learn mapping structures → molecular vectors, given
pairs of spectra-structure (in spectra database) (Main technical
contribution)
Subtask #2: learn mapping spectra → molecular vectors, given
pairs of spectra-vector (molecular vectors from subtask #1)
2) Candidate Retrieval
Given query spectrum, generate molecular vector and query
molecular vectors generated from structures
Proposed method: ADAPTIVE
10/08/2019 D.
H.
Nguyen,
Kyoto
University 24
25. ADAPTIVE: Learning step
10/08/2019 D.
H.
Nguyen,
Kyoto
University 25
Idea: learning to maximize correlation (HSIC) between spectra and structures
HSIC
SUBTASK 2: learning
mapping from spectra to
molecular vectors
SUBTASK 1: learning
mapping from structures to
vectors
Spectra
Structures
Spectra
Molecular vectors
26. ADAPTIVE: Learning step
Subtask 1: Structures→ Molecular vectors
v Use message passing network (MPNN) to
parameterize 𝜙
- Take graphs (molecules) as inputs.
- Learn representation vectors at different levels.
10/08/2019 D.
H.
Nguyen,
Kyoto
University 26
HSIC
Mol. vectors
Structures
Spectra
Parameters of can be
optimized by back-propagating
HSIC-based loss (1)
v Use Hilbert-Schmidt Independence Criterion
(HSIC) to measure correlationof vectors and
spectra
HSIC 𝑋, 𝜙 𝑌 = 𝐾d*, Φd*
>
Φd*
𝐾d*: centralizedkernel of spectra set.
Φd*
>
Φd*: centralizedkernel of mol. vectors.
v Repeatedly shufflingand dividing samples
into smaller subsets:
𝐽 =
1
𝑇
&
1
𝑛/𝐵
& 𝐾dj,G:,Φdj,G
>
Φdj,G
*/k
G-.
>
j-.
(1)
27. ADAPTIVE: Learning step
Subtask 2: Sp𝐞𝐜𝐭𝐫𝐚 → Molecular vectors
v Use Input-Output Kernel Regression
(IOKR) to learn ℎ
v Find the mapping ℎr to minimize the
regularizedleast squared error:
10/08/2019 D.
H.
Nguyen,
Kyoto
University 27
SpectraMolecular vectors
Replacing in (2) with (3),
can be estimated easily
v The solutioncan be representedby:
(According to the representer theorem.)
(3)
(2)
28. ADAPTIVE: Candidate retrieval
10/08/2019 D.
H.
Nguyen,
Kyoto
University 28
Finding y such that:
Query
ℎr
Spectra 𝐱
Converting spectra → mol. vectors
using learned ℎr
Molecular vectors structures
𝜙
Converting structures → mol. vectors
using learned 𝜙
𝑦.
𝑦V
𝑦t
29. • A benchmark dataset consists of 4138 spectra,
extracted from GNPS, being divided into 10 subsets to
perform cross-validation (CV).
• Performance (accuracies and computation time) were
evaluated by 10-fold CV.
• Hyper-parameters were determined by CV.
• 24 kernels defined for spectra are combined into a
single kernel by multiple kernel learning including
UNIMKL (equal weights given to each kernel) and
ALIGNF (optimized weights given to each kernel).
Experiments: Dataset and settings
10/08/2019 D.
H.
Nguyen,
Kyoto
University 29
30. § Predictive performance
Comparison of top 1, 10 and 20
accuracies of ADAPTIVE with
existingmethods
Experiments: Results
10/08/2019 D.
H.
Nguyen,
Kyoto
University 30
ADAPTIVE achieves improved
accuracies (~ 4%) compared to
existing methods due to that the
generated vectors are data-
driven and task-specific
31. § Computational efficiency
Comparison of computation time
for prediction by ADAPTIVE and
IOKR (considered as fastest one)
Experiments: Results
10/08/2019 D.
H.
Nguyen,
Kyoto
University 31
ADAPTIVE achieves faster computation
(~ 4 times) than IOKR due to that the
generated vectors are much more concise
than software based fingerprints.
32. • We proposed ADAPTIVE which learns a model to
generate molecular vectors for metabolites in metabolite
identification task.
• The model is learned by maximizing the dependency
between spectra and molecular structures (available in
spectra datasets).
• Molecular vectors are data-driven, task-specific, leading
to improved accuracies.
• Molecular vectors are shorter than software-based
fingerprints, leading to faster computation.
Summary
10/08/2019 D.
H.
Nguyen,
Kyoto
University 32
33. • D. H. Nguyen et al., “Recent advances and prospects of
computational methods for metabolite identification: a review with
emphasis on machine learning approaches”. Briefings in
Bioinformatics 2018.
• D. H. Nguyen et al., “SIMPLE: sparse interaction model over
peaks of molecules for fast, interpretable metabolite identification
from MS/MS”. ISMB 2018.
• D. H. Nguyen et al., “ADAPTIVE: learning data-driven, concise
molecular vectors for fast, accurate metabolite identification from
MS/MS”. ISMB/ECCB 2019.
References
10/08/2019 D.
H.
Nguyen,
Kyoto
University 33
34. Thank you for listening
Q&A
10/08/2019 D.
H.
Nguyen,
Kyoto
University 34