SlideShare a Scribd company logo
1 of 34
Download to read offline
Machine Learning for Metabolite
Identification from Mass spectrometry
Hai Dai Nguyen
Bioinformatics Center, Institute for Chemical Research
Kyoto University
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 1
1. Introduction
2. Problems, challenges and motivations
3. Proposed methods
4. Experiments
5. Conclusions
Outline
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 2
Tandem mass spectrometry (MS/MS)
§ Devices fragment compounds or molecules into fragments and
estimate frequencies of captured ones, each corresponds to a peak
§ There exist peak interactions
(co-occurrence of peaks)
Introduction
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 3
Peak interaction
Metabolite identification
§ Given a query spectrum,find molecules in database with similar spectra
Introduction: approaches for metabolite id.
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 4
Spectra database
Query
Spectra
In silico spectra Structure database
Query
Spectra
StructuresSpectra Fingerprints
spectra
Query
Spectra library In silico fragmentation Machine Learning
Fingerprint based two-step methods for metabolite identification.
1) Given a set of spectra and corresponding fingerprints, learn a model
to predict fingerprints -> Learning step
Introduction: machine learning approach
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 5
2) Given query spectrum, use its predicted fingerprints to retrieve
molecules from molecular database -> Candidate Retrieval step
spectra
Predictor
fingerprints structures
Software
1) Learning step to predict fingerprints
Predictor
fingerprints structures
Software
Query
spectra
2) Candidate Retrieval step to rank candidates
Existing methods
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 6
FingerID (Bioinformatics, 2012)
q Kernel method
• Define probability product kernel (PPK) for spectra.
• Then, use SVM for classification.
q Drawback
§ Peak interactions are ignored.
𝑝" =
1
𝑛"
& 𝑝"(𝑘)
*+
,-.
𝑝/ =
1
𝑛/
& 𝑝/(𝑘)
*0
,-.
𝐾 𝑋, 𝑌 =
1
𝑛" 𝑛/
& 𝑝"(𝑖)𝑝/(𝑗)
7,8
Existing methods
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 7
CSI:FingerID (Bioinformatics, 2014)
qImproved version of FingerID
• Define kernel for spectra by PPK
• Kernels for fragmentation trees are defined and
combined with PPK via MKL.
• Then, use SVM for classification.
Existing methods
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 8
CSI:FingerID (Bioinformatics, 2014)
Fragmentation trees
§ Models of fragmentation of a molecule in MS/MS
§ Nodes ~ peaks ~ molecular formula of fragments.
§ Edges ~ losses ~ uncaptured uncharged fragments.
§ Trees can be predicted from spectra and provide
structural information of spectra.
Existing methods
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 9
CSI:FingerID (Bioinformatics, 2014)
Pros & Cos
§ Improved accuracy due to additional
structural information provided by trees
§ Computationally expensive due to conversion
of trees from spectra
§ Lack of interpretation
Motivation (1)
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 10
To develop a learning model, which
§ is able to incorporate peak interactions into learning
model
§ is efficiently computable in prediction
§ is more interpretable
• Idea: introducing interaction term to model (two-way
interaction model)
• Prediction model:
𝑓 𝑥 = 𝑏 + 𝑤>
𝑥 + 𝑥>
𝑊𝑥	
  	
  	
  	
  	
  , 	
  	
  	
  	
  	
   𝑦 𝑥 = 𝑠𝑔𝑛(𝑓(𝑥))
• Objective function :
min
G,H,I
&[1 − 𝑦7 𝑓(𝑥7)]M
*
7-.
+ 𝛼 𝑤 . + 𝛽 𝑊 ∗
• Convexity guarantees to find globally optimal
solution.
Hinge loss Sparsity Low-rank
Main effect Interactions
8/10/19 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 11
SIMPLE: sparse interaction model for MS
• Idea: introducing background knowledge (interaction from trees)
to regularize W
• Laplacian regularization on features
ü 𝑥>
𝑊𝑥 = ∑ 𝑤78 𝑥7 𝑥87,8 = ∑ (𝑣7
>
𝑣8)𝑥7 𝑥87,8
ü 𝑊 can be decomposed as 𝑉>
𝑉
ü	
  	
  	
  	
   𝑅 𝑉 = ∑ 𝐴78 𝑣7 − 𝑣8
V
7,8
= trace 𝑊𝐿 , where L is laplacian matrix
§ New objective function :
min
G,H,I
∑ [1 − 𝑦7 𝑓(𝑥7)]M
*
7-. + 𝛼 𝑤 . + 𝛽 𝑊 ∗ + 𝛾	
  trace(𝑊𝐿)
§ Still convex
8/10/19 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 12
L-SIMPLE: sparse interaction model for MS
+
q Objective function with all non-
smooth terms (1)
q Alternating direction method of
multipliers (ADMM)
üIntroducing C to turn (1) to
constrained problem (2)
üDefining augmented
Lagrange function (3)
üIterating between primal
and dual variables (4)
(1)
(2)
(3)
(4)
8/10/19 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 13
L-SIMPLE: sparse interaction model for MS
Optimization
Dataset
üMassBank (402 spectra + corresponding molecular
structures)
üMass range: [1, 963]
üFingerprints of 528 bits, converted by OpenBabel.
Settings
üFeature dim: 500 by dividing mass range into bins.
üHyperparameters 𝛼, 𝛽, 𝛾 are chosen by 5-fold CV.
üEvaluation: Accuracy & F1 score.
Experiments: Dataset and settings
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 14
Experiments: Results
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 15
Comparison of (L)-SIMPLE with other existing ones.
~	
  300	
  times	
  faster
Experiments: Results
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 16
Case studies to illustrate the effects of peak interactions: 𝑤. 𝑥. + 𝑤V 𝑥V+𝑊.V 𝑥. 𝑥V
(1)
Experiments: Results
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 17
Case studies to illustrate the effects of peak interactions: 𝑤. 𝑥. + 𝑤V 𝑥V+𝑊.V 𝑥. 𝑥V
(2)
Case studies to illustrate the effects of peak interactions: 𝑤. 𝑥. + 𝑤V 𝑥V+𝑊.V 𝑥. 𝑥V
* These confirms the importance of peak interactions for fingerprint prediction.
8/10/19 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 18
(3)
Experiments: Results
• Focused	
  on	
  fingerprint	
  prediction	
  (first	
  step)	
  for	
  
metabolite	
  identification.
• Proposed	
  sparse	
  interaction	
  model	
  (SIMPLE)	
  is	
  able	
  
to	
  incorporate	
  peak	
  interactions	
  and	
  	
  more	
  
interpretable,	
  faster in	
  prediction.
8/10/19 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 19
Summary
§ Fingerprint based two-step methods for metabolite identification.
Recap: machine learning approach
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 20
Drawbacks of software-based fingerprints:
1) Not task-specific-> limited predictive performance
2) Large in size -> slow prediction
spectra
Predictor
1) Learning step to predict fingerprints
fingerprints structures
Software
2) Candidate Retrieval step to rank candidates
Predictor
fingerprints structures
Software
Query
spectra
§ Learning a model to generate representations (Molecular
Vectors) for metabolites from molecular structures with
following advantages:
1) Adaptive to Metabolite Identification task
Molecular vectors are specific to task and data, and therefore non-
redundant, possibly leading to better predictive performance
2) Shorter representations for metabolites
Molecular vectors are not necessarily binary and might contain
information relevant to the task, leading to their much smaller size
Motivation (2)
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 21
§ Fingerprint based two-step methods for metabolite identification.
Motivation (2)
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 22
spectra
Predictor
Software based fingerprints
fingerprints structures
Software
§ Learning molecular vectors for metabolite identification.
Motivation (2)
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 23
spectra
Predictor
LEARNING 𝜙 to GENERATE molecular vectors
molecular vectors structures
𝜙
Challenge: Learning 𝜙 is hard as
no supervised information is
available L
Main idea: Learning 𝜙 to
maximize correlation between
spectra and structures J
Has two steps: Learning and Candidate Retrieval
1) Learning
There are two subtasks of learning:
Subtask #1: learn mapping structures → molecular vectors, given
pairs of spectra-structure (in spectra database) (Main technical
contribution)
Subtask #2: learn mapping spectra → molecular vectors, given
pairs of spectra-vector (molecular vectors from subtask #1)
2) Candidate Retrieval
Given query spectrum, generate molecular vector and query
molecular vectors generated from structures
Proposed method: ADAPTIVE
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 24
ADAPTIVE: Learning step
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 25
Idea: learning to maximize correlation (HSIC) between spectra and structures
HSIC
SUBTASK 2: learning
mapping from spectra to
molecular vectors
SUBTASK 1: learning
mapping from structures to
vectors
Spectra
Structures
Spectra
Molecular vectors
ADAPTIVE: Learning step
Subtask 1: Structures→ Molecular vectors
v Use message passing network (MPNN) to
parameterize 𝜙
- Take graphs (molecules) as inputs.
- Learn representation vectors at different levels.
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 26
HSIC
Mol. vectors
Structures
Spectra
Parameters of can be
optimized by back-propagating
HSIC-based loss (1)
v Use Hilbert-Schmidt Independence Criterion
(HSIC) to measure correlationof vectors and
spectra
HSIC 𝑋, 𝜙 𝑌 = 𝐾d*, Φd*
>
Φd*
	
   𝐾d*: centralizedkernel of spectra set.
	
  	
  Φd*
>
Φd*: centralizedkernel of mol. vectors.
v Repeatedly shufflingand dividing samples
into smaller subsets:
𝐽 =
1
𝑇
&
1
𝑛/𝐵
& 𝐾dj,G:,Φdj,G
>
Φdj,G
*/k
G-.
>
j-.
(1)
ADAPTIVE: Learning step
Subtask 2: Sp𝐞𝐜𝐭𝐫𝐚 → Molecular vectors
v Use Input-Output Kernel Regression
(IOKR) to learn ℎ
v Find the mapping ℎr to minimize the
regularizedleast squared error:
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 27
SpectraMolecular vectors
Replacing in (2) with (3),
can be estimated easily
v The solutioncan be representedby:
(According to the representer theorem.)
(3)
(2)
ADAPTIVE: Candidate retrieval
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 28
Finding y such that:
Query
ℎr
Spectra 𝐱
Converting spectra → mol. vectors
using learned ℎr
Molecular vectors structures
𝜙
Converting structures → mol. vectors
using learned 𝜙
𝑦.
𝑦V
𝑦t
• A benchmark dataset consists of 4138 spectra,
extracted from GNPS, being divided into 10 subsets to
perform cross-validation (CV).
• Performance (accuracies and computation time) were
evaluated by 10-fold CV.
• Hyper-parameters were determined by CV.
• 24 kernels defined for spectra are combined into a
single kernel by multiple kernel learning including
UNIMKL (equal weights given to each kernel) and
ALIGNF (optimized weights given to each kernel).
Experiments: Dataset and settings
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 29
§ Predictive performance
Comparison of top 1, 10 and 20
accuracies of ADAPTIVE with
existingmethods
Experiments: Results
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 30
ADAPTIVE achieves improved
accuracies (~ 4%) compared to
existing methods due to that the
generated vectors are data-
driven and task-specific
§ Computational efficiency
Comparison of computation time
for prediction by ADAPTIVE and
IOKR (considered as fastest one)
Experiments: Results
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 31
ADAPTIVE achieves faster computation
(~ 4 times) than IOKR due to that the
generated vectors are much more concise
than software based fingerprints.
• We proposed ADAPTIVE which learns a model to
generate molecular vectors for metabolites in metabolite
identification task.
• The model is learned by maximizing the dependency
between spectra and molecular structures (available in
spectra datasets).
• Molecular vectors are data-driven, task-specific, leading
to improved accuracies.
• Molecular vectors are shorter than software-based
fingerprints, leading to faster computation.
Summary
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 32
• D. H. Nguyen et al., “Recent advances and prospects of
computational methods for metabolite identification: a review with
emphasis on machine learning approaches”. Briefings in
Bioinformatics 2018.
• D. H. Nguyen et al., “SIMPLE: sparse interaction model over
peaks of molecules for fast, interpretable metabolite identification
from MS/MS”. ISMB 2018.
• D. H. Nguyen et al., “ADAPTIVE: learning data-driven, concise
molecular vectors for fast, accurate metabolite identification from
MS/MS”. ISMB/ECCB 2019.
References
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 33
Thank you for listening
Q&A
10/08/2019 D.	
  H.	
  Nguyen,	
  Kyoto	
  University 34

More Related Content

What's hot

An Artificial Intelligence Approach to Ultra High Frequency Path Loss Modelli...
An Artificial Intelligence Approach to Ultra High Frequency Path Loss Modelli...An Artificial Intelligence Approach to Ultra High Frequency Path Loss Modelli...
An Artificial Intelligence Approach to Ultra High Frequency Path Loss Modelli...ijtsrd
 
Improved wolf algorithm on document images detection using optimum mean techn...
Improved wolf algorithm on document images detection using optimum mean techn...Improved wolf algorithm on document images detection using optimum mean techn...
Improved wolf algorithm on document images detection using optimum mean techn...journalBEEI
 
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATIONMULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATIONgerogepatton
 
Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07dingggthu
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingIAEME Publication
 
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
 
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONA NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONsipij
 
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...ijma
 
Biomedical Image Retrieval using LBWP
Biomedical Image Retrieval using LBWPBiomedical Image Retrieval using LBWP
Biomedical Image Retrieval using LBWPIRJET Journal
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysistuxette
 
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases
Dynamic Two-Stage Image Retrieval from Large Multimodal DatabasesDynamic Two-Stage Image Retrieval from Large Multimodal Databases
Dynamic Two-Stage Image Retrieval from Large Multimodal DatabasesKonstantinos Zagoris
 
Geometric Correction for Braille Document Images
Geometric Correction for Braille Document Images  Geometric Correction for Braille Document Images
Geometric Correction for Braille Document Images csandit
 

What's hot (17)

MultiModal Retrieval Image
MultiModal Retrieval ImageMultiModal Retrieval Image
MultiModal Retrieval Image
 
An Artificial Intelligence Approach to Ultra High Frequency Path Loss Modelli...
An Artificial Intelligence Approach to Ultra High Frequency Path Loss Modelli...An Artificial Intelligence Approach to Ultra High Frequency Path Loss Modelli...
An Artificial Intelligence Approach to Ultra High Frequency Path Loss Modelli...
 
Improved wolf algorithm on document images detection using optimum mean techn...
Improved wolf algorithm on document images detection using optimum mean techn...Improved wolf algorithm on document images detection using optimum mean techn...
Improved wolf algorithm on document images detection using optimum mean techn...
 
40120130405012
4012013040501240120130405012
40120130405012
 
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATIONMULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
MULTI-LEVEL FEATURE FUSION BASED TRANSFER LEARNING FOR PERSON RE-IDENTIFICATION
 
Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07
 
Kernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of movingKernel based similarity estimation and real time tracking of moving
Kernel based similarity estimation and real time tracking of moving
 
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
 
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONA NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
 
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
Multilinear Kernel Mapping for Feature Dimension Reduction in Content Based M...
 
Ijetr021113
Ijetr021113Ijetr021113
Ijetr021113
 
Av4102350358
Av4102350358Av4102350358
Av4102350358
 
Biomedical Image Retrieval using LBWP
Biomedical Image Retrieval using LBWPBiomedical Image Retrieval using LBWP
Biomedical Image Retrieval using LBWP
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases
Dynamic Two-Stage Image Retrieval from Large Multimodal DatabasesDynamic Two-Stage Image Retrieval from Large Multimodal Databases
Dynamic Two-Stage Image Retrieval from Large Multimodal Databases
 
Geometric Correction for Braille Document Images
Geometric Correction for Braille Document Images  Geometric Correction for Braille Document Images
Geometric Correction for Braille Document Images
 

Similar to Advanced machine learning for metabolite identification

A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningNatasha Grant
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)Hajime Sasaki
 
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...ssuser4b1f48
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Anubhav Jain
 
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology Images
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology ImagesDilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology Images
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology ImagesIRJET Journal
 
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...thanhdowork
 
IRJET - Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence M...
IRJET - Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence M...IRJET - Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence M...
IRJET - Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence M...IRJET Journal
 
IRJET- Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence Ma...
IRJET- Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence Ma...IRJET- Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence Ma...
IRJET- Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence Ma...IRJET Journal
 
Microarray spot partitioning by autonomously organising maps through contour ...
Microarray spot partitioning by autonomously organising maps through contour ...Microarray spot partitioning by autonomously organising maps through contour ...
Microarray spot partitioning by autonomously organising maps through contour ...IJECEIAES
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...IAEME Publication
 
Towards smart modeling of mechanical properties of a bio composite based on ...
Towards smart modeling of mechanical properties of a bio  composite based on ...Towards smart modeling of mechanical properties of a bio  composite based on ...
Towards smart modeling of mechanical properties of a bio composite based on ...IJECEIAES
 
A Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various AlgorithmsA Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various AlgorithmsIJMTST Journal
 
Estimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachEstimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachcsandit
 
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHcscpconf
 
IRJET- Object Detection using Hausdorff Distance
IRJET-  	  Object Detection using Hausdorff DistanceIRJET-  	  Object Detection using Hausdorff Distance
IRJET- Object Detection using Hausdorff DistanceIRJET Journal
 
Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...butest
 

Similar to Advanced machine learning for metabolite identification (20)

IBSB tutorial
IBSB tutorialIBSB tutorial
IBSB tutorial
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)
 
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
NS-CUK Journal club: HELee, Review on "Graph embedding on biomedical networks...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology Images
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology ImagesDilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology Images
Dilated Inception U-Net for Nuclei Segmentation in Multi-Organ Histology Images
 
algorithms
algorithmsalgorithms
algorithms
 
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
 
IRJET - Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence M...
IRJET - Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence M...IRJET - Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence M...
IRJET - Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence M...
 
IRJET- Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence Ma...
IRJET- Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence Ma...IRJET- Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence Ma...
IRJET- Plant Leaf Disease Diagnosis from Color Imagery using Co-Occurrence Ma...
 
Microarray spot partitioning by autonomously organising maps through contour ...
Microarray spot partitioning by autonomously organising maps through contour ...Microarray spot partitioning by autonomously organising maps through contour ...
Microarray spot partitioning by autonomously organising maps through contour ...
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
Towards smart modeling of mechanical properties of a bio composite based on ...
Towards smart modeling of mechanical properties of a bio  composite based on ...Towards smart modeling of mechanical properties of a bio  composite based on ...
Towards smart modeling of mechanical properties of a bio composite based on ...
 
A Learning Automata Based Prediction Mechanism for Target Tracking in Wireles...
A Learning Automata Based Prediction Mechanism for Target Tracking in Wireles...A Learning Automata Based Prediction Mechanism for Target Tracking in Wireles...
A Learning Automata Based Prediction Mechanism for Target Tracking in Wireles...
 
A Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various AlgorithmsA Survey On Tracking Moving Objects Using Various Algorithms
A Survey On Tracking Moving Objects Using Various Algorithms
 
Estimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approachEstimating project development effort using clustered regression approach
Estimating project development effort using clustered regression approach
 
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACHESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
ESTIMATING PROJECT DEVELOPMENT EFFORT USING CLUSTERED REGRESSION APPROACH
 
Energy management system
Energy management systemEnergy management system
Energy management system
 
IRJET- Object Detection using Hausdorff Distance
IRJET-  	  Object Detection using Hausdorff DistanceIRJET-  	  Object Detection using Hausdorff Distance
IRJET- Object Detection using Hausdorff Distance
 
Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...Comparison of relational and attribute-IEEE-1999-published ...
Comparison of relational and attribute-IEEE-1999-published ...
 

More from Dai-Hai Nguyen

Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodelsDai-Hai Nguyen
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GANDai-Hai Nguyen
 
Hierarchical selection
Hierarchical selectionHierarchical selection
Hierarchical selectionDai-Hai Nguyen
 
Semi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property predictionSemi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property predictionDai-Hai Nguyen
 

More from Dai-Hai Nguyen (7)

Metrics for generativemodels
Metrics for generativemodelsMetrics for generativemodels
Metrics for generativemodels
 
Brief introduction on GAN
Brief introduction on GANBrief introduction on GAN
Brief introduction on GAN
 
Hierarchical selection
Hierarchical selectionHierarchical selection
Hierarchical selection
 
Semi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property predictionSemi-supervised learning model for molecular property prediction
Semi-supervised learning model for molecular property prediction
 
DL for molecules
DL for moleculesDL for molecules
DL for molecules
 
Seminar
SeminarSeminar
Seminar
 
Collaborative DL
Collaborative DLCollaborative DL
Collaborative DL
 

Recently uploaded

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 

Recently uploaded (20)

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 

Advanced machine learning for metabolite identification

  • 1. Machine Learning for Metabolite Identification from Mass spectrometry Hai Dai Nguyen Bioinformatics Center, Institute for Chemical Research Kyoto University 10/08/2019 D.  H.  Nguyen,  Kyoto  University 1
  • 2. 1. Introduction 2. Problems, challenges and motivations 3. Proposed methods 4. Experiments 5. Conclusions Outline 10/08/2019 D.  H.  Nguyen,  Kyoto  University 2
  • 3. Tandem mass spectrometry (MS/MS) § Devices fragment compounds or molecules into fragments and estimate frequencies of captured ones, each corresponds to a peak § There exist peak interactions (co-occurrence of peaks) Introduction 10/08/2019 D.  H.  Nguyen,  Kyoto  University 3 Peak interaction
  • 4. Metabolite identification § Given a query spectrum,find molecules in database with similar spectra Introduction: approaches for metabolite id. 10/08/2019 D.  H.  Nguyen,  Kyoto  University 4 Spectra database Query Spectra In silico spectra Structure database Query Spectra StructuresSpectra Fingerprints spectra Query Spectra library In silico fragmentation Machine Learning
  • 5. Fingerprint based two-step methods for metabolite identification. 1) Given a set of spectra and corresponding fingerprints, learn a model to predict fingerprints -> Learning step Introduction: machine learning approach 10/08/2019 D.  H.  Nguyen,  Kyoto  University 5 2) Given query spectrum, use its predicted fingerprints to retrieve molecules from molecular database -> Candidate Retrieval step spectra Predictor fingerprints structures Software 1) Learning step to predict fingerprints Predictor fingerprints structures Software Query spectra 2) Candidate Retrieval step to rank candidates
  • 6. Existing methods 10/08/2019 D.  H.  Nguyen,  Kyoto  University 6 FingerID (Bioinformatics, 2012) q Kernel method • Define probability product kernel (PPK) for spectra. • Then, use SVM for classification. q Drawback § Peak interactions are ignored. 𝑝" = 1 𝑛" & 𝑝"(𝑘) *+ ,-. 𝑝/ = 1 𝑛/ & 𝑝/(𝑘) *0 ,-. 𝐾 𝑋, 𝑌 = 1 𝑛" 𝑛/ & 𝑝"(𝑖)𝑝/(𝑗) 7,8
  • 7. Existing methods 10/08/2019 D.  H.  Nguyen,  Kyoto  University 7 CSI:FingerID (Bioinformatics, 2014) qImproved version of FingerID • Define kernel for spectra by PPK • Kernels for fragmentation trees are defined and combined with PPK via MKL. • Then, use SVM for classification.
  • 8. Existing methods 10/08/2019 D.  H.  Nguyen,  Kyoto  University 8 CSI:FingerID (Bioinformatics, 2014) Fragmentation trees § Models of fragmentation of a molecule in MS/MS § Nodes ~ peaks ~ molecular formula of fragments. § Edges ~ losses ~ uncaptured uncharged fragments. § Trees can be predicted from spectra and provide structural information of spectra.
  • 9. Existing methods 10/08/2019 D.  H.  Nguyen,  Kyoto  University 9 CSI:FingerID (Bioinformatics, 2014) Pros & Cos § Improved accuracy due to additional structural information provided by trees § Computationally expensive due to conversion of trees from spectra § Lack of interpretation
  • 10. Motivation (1) 10/08/2019 D.  H.  Nguyen,  Kyoto  University 10 To develop a learning model, which § is able to incorporate peak interactions into learning model § is efficiently computable in prediction § is more interpretable
  • 11. • Idea: introducing interaction term to model (two-way interaction model) • Prediction model: 𝑓 𝑥 = 𝑏 + 𝑤> 𝑥 + 𝑥> 𝑊𝑥          ,           𝑦 𝑥 = 𝑠𝑔𝑛(𝑓(𝑥)) • Objective function : min G,H,I &[1 − 𝑦7 𝑓(𝑥7)]M * 7-. + 𝛼 𝑤 . + 𝛽 𝑊 ∗ • Convexity guarantees to find globally optimal solution. Hinge loss Sparsity Low-rank Main effect Interactions 8/10/19 D.  H.  Nguyen,  Kyoto  University 11 SIMPLE: sparse interaction model for MS
  • 12. • Idea: introducing background knowledge (interaction from trees) to regularize W • Laplacian regularization on features ü 𝑥> 𝑊𝑥 = ∑ 𝑤78 𝑥7 𝑥87,8 = ∑ (𝑣7 > 𝑣8)𝑥7 𝑥87,8 ü 𝑊 can be decomposed as 𝑉> 𝑉 ü         𝑅 𝑉 = ∑ 𝐴78 𝑣7 − 𝑣8 V 7,8 = trace 𝑊𝐿 , where L is laplacian matrix § New objective function : min G,H,I ∑ [1 − 𝑦7 𝑓(𝑥7)]M * 7-. + 𝛼 𝑤 . + 𝛽 𝑊 ∗ + 𝛾  trace(𝑊𝐿) § Still convex 8/10/19 D.  H.  Nguyen,  Kyoto  University 12 L-SIMPLE: sparse interaction model for MS +
  • 13. q Objective function with all non- smooth terms (1) q Alternating direction method of multipliers (ADMM) üIntroducing C to turn (1) to constrained problem (2) üDefining augmented Lagrange function (3) üIterating between primal and dual variables (4) (1) (2) (3) (4) 8/10/19 D.  H.  Nguyen,  Kyoto  University 13 L-SIMPLE: sparse interaction model for MS Optimization
  • 14. Dataset üMassBank (402 spectra + corresponding molecular structures) üMass range: [1, 963] üFingerprints of 528 bits, converted by OpenBabel. Settings üFeature dim: 500 by dividing mass range into bins. üHyperparameters 𝛼, 𝛽, 𝛾 are chosen by 5-fold CV. üEvaluation: Accuracy & F1 score. Experiments: Dataset and settings 10/08/2019 D.  H.  Nguyen,  Kyoto  University 14
  • 15. Experiments: Results 10/08/2019 D.  H.  Nguyen,  Kyoto  University 15 Comparison of (L)-SIMPLE with other existing ones. ~  300  times  faster
  • 16. Experiments: Results 10/08/2019 D.  H.  Nguyen,  Kyoto  University 16 Case studies to illustrate the effects of peak interactions: 𝑤. 𝑥. + 𝑤V 𝑥V+𝑊.V 𝑥. 𝑥V (1)
  • 17. Experiments: Results 10/08/2019 D.  H.  Nguyen,  Kyoto  University 17 Case studies to illustrate the effects of peak interactions: 𝑤. 𝑥. + 𝑤V 𝑥V+𝑊.V 𝑥. 𝑥V (2)
  • 18. Case studies to illustrate the effects of peak interactions: 𝑤. 𝑥. + 𝑤V 𝑥V+𝑊.V 𝑥. 𝑥V * These confirms the importance of peak interactions for fingerprint prediction. 8/10/19 D.  H.  Nguyen,  Kyoto  University 18 (3) Experiments: Results
  • 19. • Focused  on  fingerprint  prediction  (first  step)  for   metabolite  identification. • Proposed  sparse  interaction  model  (SIMPLE)  is  able   to  incorporate  peak  interactions  and    more   interpretable,  faster in  prediction. 8/10/19 D.  H.  Nguyen,  Kyoto  University 19 Summary
  • 20. § Fingerprint based two-step methods for metabolite identification. Recap: machine learning approach 10/08/2019 D.  H.  Nguyen,  Kyoto  University 20 Drawbacks of software-based fingerprints: 1) Not task-specific-> limited predictive performance 2) Large in size -> slow prediction spectra Predictor 1) Learning step to predict fingerprints fingerprints structures Software 2) Candidate Retrieval step to rank candidates Predictor fingerprints structures Software Query spectra
  • 21. § Learning a model to generate representations (Molecular Vectors) for metabolites from molecular structures with following advantages: 1) Adaptive to Metabolite Identification task Molecular vectors are specific to task and data, and therefore non- redundant, possibly leading to better predictive performance 2) Shorter representations for metabolites Molecular vectors are not necessarily binary and might contain information relevant to the task, leading to their much smaller size Motivation (2) 10/08/2019 D.  H.  Nguyen,  Kyoto  University 21
  • 22. § Fingerprint based two-step methods for metabolite identification. Motivation (2) 10/08/2019 D.  H.  Nguyen,  Kyoto  University 22 spectra Predictor Software based fingerprints fingerprints structures Software
  • 23. § Learning molecular vectors for metabolite identification. Motivation (2) 10/08/2019 D.  H.  Nguyen,  Kyoto  University 23 spectra Predictor LEARNING 𝜙 to GENERATE molecular vectors molecular vectors structures 𝜙 Challenge: Learning 𝜙 is hard as no supervised information is available L Main idea: Learning 𝜙 to maximize correlation between spectra and structures J
  • 24. Has two steps: Learning and Candidate Retrieval 1) Learning There are two subtasks of learning: Subtask #1: learn mapping structures → molecular vectors, given pairs of spectra-structure (in spectra database) (Main technical contribution) Subtask #2: learn mapping spectra → molecular vectors, given pairs of spectra-vector (molecular vectors from subtask #1) 2) Candidate Retrieval Given query spectrum, generate molecular vector and query molecular vectors generated from structures Proposed method: ADAPTIVE 10/08/2019 D.  H.  Nguyen,  Kyoto  University 24
  • 25. ADAPTIVE: Learning step 10/08/2019 D.  H.  Nguyen,  Kyoto  University 25 Idea: learning to maximize correlation (HSIC) between spectra and structures HSIC SUBTASK 2: learning mapping from spectra to molecular vectors SUBTASK 1: learning mapping from structures to vectors Spectra Structures Spectra Molecular vectors
  • 26. ADAPTIVE: Learning step Subtask 1: Structures→ Molecular vectors v Use message passing network (MPNN) to parameterize 𝜙 - Take graphs (molecules) as inputs. - Learn representation vectors at different levels. 10/08/2019 D.  H.  Nguyen,  Kyoto  University 26 HSIC Mol. vectors Structures Spectra Parameters of can be optimized by back-propagating HSIC-based loss (1) v Use Hilbert-Schmidt Independence Criterion (HSIC) to measure correlationof vectors and spectra HSIC 𝑋, 𝜙 𝑌 = 𝐾d*, Φd* > Φd*   𝐾d*: centralizedkernel of spectra set.    Φd* > Φd*: centralizedkernel of mol. vectors. v Repeatedly shufflingand dividing samples into smaller subsets: 𝐽 = 1 𝑇 & 1 𝑛/𝐵 & 𝐾dj,G:,Φdj,G > Φdj,G */k G-. > j-. (1)
  • 27. ADAPTIVE: Learning step Subtask 2: Sp𝐞𝐜𝐭𝐫𝐚 → Molecular vectors v Use Input-Output Kernel Regression (IOKR) to learn ℎ v Find the mapping ℎr to minimize the regularizedleast squared error: 10/08/2019 D.  H.  Nguyen,  Kyoto  University 27 SpectraMolecular vectors Replacing in (2) with (3), can be estimated easily v The solutioncan be representedby: (According to the representer theorem.) (3) (2)
  • 28. ADAPTIVE: Candidate retrieval 10/08/2019 D.  H.  Nguyen,  Kyoto  University 28 Finding y such that: Query ℎr Spectra 𝐱 Converting spectra → mol. vectors using learned ℎr Molecular vectors structures 𝜙 Converting structures → mol. vectors using learned 𝜙 𝑦. 𝑦V 𝑦t
  • 29. • A benchmark dataset consists of 4138 spectra, extracted from GNPS, being divided into 10 subsets to perform cross-validation (CV). • Performance (accuracies and computation time) were evaluated by 10-fold CV. • Hyper-parameters were determined by CV. • 24 kernels defined for spectra are combined into a single kernel by multiple kernel learning including UNIMKL (equal weights given to each kernel) and ALIGNF (optimized weights given to each kernel). Experiments: Dataset and settings 10/08/2019 D.  H.  Nguyen,  Kyoto  University 29
  • 30. § Predictive performance Comparison of top 1, 10 and 20 accuracies of ADAPTIVE with existingmethods Experiments: Results 10/08/2019 D.  H.  Nguyen,  Kyoto  University 30 ADAPTIVE achieves improved accuracies (~ 4%) compared to existing methods due to that the generated vectors are data- driven and task-specific
  • 31. § Computational efficiency Comparison of computation time for prediction by ADAPTIVE and IOKR (considered as fastest one) Experiments: Results 10/08/2019 D.  H.  Nguyen,  Kyoto  University 31 ADAPTIVE achieves faster computation (~ 4 times) than IOKR due to that the generated vectors are much more concise than software based fingerprints.
  • 32. • We proposed ADAPTIVE which learns a model to generate molecular vectors for metabolites in metabolite identification task. • The model is learned by maximizing the dependency between spectra and molecular structures (available in spectra datasets). • Molecular vectors are data-driven, task-specific, leading to improved accuracies. • Molecular vectors are shorter than software-based fingerprints, leading to faster computation. Summary 10/08/2019 D.  H.  Nguyen,  Kyoto  University 32
  • 33. • D. H. Nguyen et al., “Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches”. Briefings in Bioinformatics 2018. • D. H. Nguyen et al., “SIMPLE: sparse interaction model over peaks of molecules for fast, interpretable metabolite identification from MS/MS”. ISMB 2018. • D. H. Nguyen et al., “ADAPTIVE: learning data-driven, concise molecular vectors for fast, accurate metabolite identification from MS/MS”. ISMB/ECCB 2019. References 10/08/2019 D.  H.  Nguyen,  Kyoto  University 33
  • 34. Thank you for listening Q&A 10/08/2019 D.  H.  Nguyen,  Kyoto  University 34