SlideShare a Scribd company logo
Enzyme Annotation using Conditional Ranking
Algorithms
Michiel Stock
Faculty of Bioscience Engineering
Ghent University
6th of June 2014
KERMIT
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 1 / 14
Outline
1 From Structure to Function
2 Ranking Enzymes
3 Learning to Rank
4 Results
5 Conclusion
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 2 / 14
From Structure to Function
What bioinformatics is (often) about
Bioinformatics for proteins
Using biological knowledge and statistical models to map information
from a low level (e.g. protein structure) to a higher level (e.g. molecular
function).
Sequence Structure Function
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 3 / 14
From Structure to Function
The data set
Data:
two data sets of ca. 1600
enzymes with 21
different functions
five different similarity
measures of the active
site
active site of an
enzyme:
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 4 / 14
From Structure to Function
The enzyme commission number
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 5 / 14
Ranking Enzymes
Quantifying enzyme function similarity
EC 2.7.7.12
EC 4.2.3.90
EC ?.?.?.?
EC 2.7.7.34
EC 4.6.1.11
EC 2.7.1.12
1
0
0
3
0
2
0
2
0
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 6 / 14
Ranking Enzymes
Conditional ranking of enzymes
Ranking enzymes
For an unannotated enzyme, rank the annotated enzymes so that the
top has a similar function w.r.t. the query.
Minimize ranking error:
number of switches needed
for a perfect ranking
Example: suppose one has an
enzyme with unknown
function: EC ?.?.?.?
1 EC 2.7.7.12
2 EC 2.7.7.12
3 EC 2.7.7.34
4 EC 2.7.1.12
5 EC 2.7.7.34
6 EC 4.2.3.90
7 EC 1.14.11
8 EC 4.6.1.11
⇒ EC 2.7.7.12
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 7 / 14
Learning to Rank
Learning the catalytic similarity
pair of enzymes:
e = (v, v )
label ye ∈ {0, 1, 2, 3, 4}:
the catalytic similarity
five different structural
similarities: Kφ(v, v )
A B C D E F G
A 4 4 0 0 0
B 4 4 0 0 0
C 0 0 4 2 1
D 0 0 2 4 3
E 0 0 1 3 4
F
G
Enzymes
Enzymes
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 8 / 14
Learning to Rank
Pairwise features with the Kronecker product
( , )
( , )
( , )
( , )
( , )
( , )
Object kernel Pairwise kernel
Learning!
algorithm
…
SVM!
RLS!
…
The Kronecker kernel is defined as:
KΦ
((v, v ), (¯v, ¯v )) = KΦ
(e, ¯e) = Kφ
(v, ¯v)Kφ
(v , ¯v )
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 9 / 14
Learning to Rank
Basic pairwise models
Use training data T = {(e, ye)} to fit a model:
h(e) =
¯e∈T
a¯eKΦ
(e, ¯e).
The function h ∈ H can be fitted using the following optimisation problem:
A(T) = arg min
h∈H
L(h, T) + λ||h||2
H.
For conditional ranking we choose an approximation of the rank loss.
This problem has time complexity O(n3), with n the number of enzymes.
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 10 / 14
Results
Qualitative improvement in the enzyme similarities
Example for CavBase structural similarity:
Ground truthSupervisedUnsupervised
Lighter color = higher similarity
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 11 / 14
Results
Improvement of the ROC curves
ROC curves for the five different structural similarity measures:
unsupervised and supervised
False positive rate
Averagetruepositiverate
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
CB sup.
FP sup.
LPCS sup.
MCS sup.
SW sup.
CB unsup.
FP unsup.
LPCS unsup.
MCS unsup.
SW unsup.
ROC curve for the different enzyme similarity
measurements of data set I
Improvement
Increase of AUC from ca. 0.7 to more than 0.8!
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 12 / 14
Conclusion
General conclusions
1 enzyme function prediction can nicely be cast in a conditional ranking
framework
2 supervised ranking is a clear improvement upon the baseline
3 efficient enough for many bioinformatics applications
4 can be generalised to many other settings
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 13 / 14
Conclusion
Acknowledgements
Ghent University
Bernard De Baets
Willem Waegeman
University of Turku
Tapio Pahikkala
Antti Airola
University of Marburg
Thomas Fober
Eyke H¨ullermeier
Want to know more?
[1] T. Pahikkala, A. Airola, M. Stock, B. De Baets, and W. Waegeman. Efficient regularized least-squares algorithms for
conditional ranking on relational data. Machine Learning, 93(2-3):321–356, 2013.
[2] M. Stock, T. Fober, E. H¨ullermeier, S. Glinca, G. Klebe, T. Pahikkala, A. Airola, B. De Baets, and W. Waegeman.
Identification of functionally related enzymes by learning-to-rank methods. IEEE Transactions on Computational Biology
and Bioinformatics, page Accepted for publication, 2014.
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 14 / 14

More Related Content

Similar to Enzyme Annotation using Conditional Ranking Algorithms

Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
Lee Larcombe
 
Presentation material
Presentation materialPresentation material
Presentation materialbutest
 
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
ExternalEvents
 
SJA CV Training Certificates and Work Presentation 15-Apr-18
SJA CV Training Certificates and Work Presentation 15-Apr-18SJA CV Training Certificates and Work Presentation 15-Apr-18
SJA CV Training Certificates and Work Presentation 15-Apr-18
Shareef Jarvi Antar
 
Morphit introduction
Morphit introductionMorphit introduction
Morphit introduction
The Edge Software Consultancy Ltd
 
Function and Phenotype Prediction through Data and Knowledge Fusion
Function and Phenotype Prediction through Data and Knowledge FusionFunction and Phenotype Prediction through Data and Knowledge Fusion
Function and Phenotype Prediction through Data and Knowledge Fusion
Karin Verspoor
 
EFFICIENT EXECUTION METHODS OF PIVOTING FOR BULK EXTRACTION OF ENTITY-ATTRIBU...
EFFICIENT EXECUTION METHODS OF PIVOTING FOR BULK EXTRACTION OF ENTITY-ATTRIBU...EFFICIENT EXECUTION METHODS OF PIVOTING FOR BULK EXTRACTION OF ENTITY-ATTRIBU...
EFFICIENT EXECUTION METHODS OF PIVOTING FOR BULK EXTRACTION OF ENTITY-ATTRIBU...
I3E Technologies
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
Sunghwan Kim
 
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
ChemAxon
 
Ucl120810
Ucl120810Ucl120810
Ucl120810
Philip Bourne
 
JEVBase: An Interactive Resource for Protein Annotationof JE Virus
JEVBase: An Interactive Resource for Protein Annotationof JE VirusJEVBase: An Interactive Resource for Protein Annotationof JE Virus
JEVBase: An Interactive Resource for Protein Annotationof JE Virus
CSCJournals
 

Similar to Enzyme Annotation using Conditional Ranking Algorithms (11)

Molecular modelling for in silico drug discovery
Molecular modelling for in silico drug discoveryMolecular modelling for in silico drug discovery
Molecular modelling for in silico drug discovery
 
Presentation material
Presentation materialPresentation material
Presentation material
 
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...
 
SJA CV Training Certificates and Work Presentation 15-Apr-18
SJA CV Training Certificates and Work Presentation 15-Apr-18SJA CV Training Certificates and Work Presentation 15-Apr-18
SJA CV Training Certificates and Work Presentation 15-Apr-18
 
Morphit introduction
Morphit introductionMorphit introduction
Morphit introduction
 
Function and Phenotype Prediction through Data and Knowledge Fusion
Function and Phenotype Prediction through Data and Knowledge FusionFunction and Phenotype Prediction through Data and Knowledge Fusion
Function and Phenotype Prediction through Data and Knowledge Fusion
 
EFFICIENT EXECUTION METHODS OF PIVOTING FOR BULK EXTRACTION OF ENTITY-ATTRIBU...
EFFICIENT EXECUTION METHODS OF PIVOTING FOR BULK EXTRACTION OF ENTITY-ATTRIBU...EFFICIENT EXECUTION METHODS OF PIVOTING FOR BULK EXTRACTION OF ENTITY-ATTRIBU...
EFFICIENT EXECUTION METHODS OF PIVOTING FOR BULK EXTRACTION OF ENTITY-ATTRIBU...
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...
 
Ucl120810
Ucl120810Ucl120810
Ucl120810
 
JEVBase: An Interactive Resource for Protein Annotationof JE Virus
JEVBase: An Interactive Resource for Protein Annotationof JE VirusJEVBase: An Interactive Resource for Protein Annotationof JE Virus
JEVBase: An Interactive Resource for Protein Annotationof JE Virus
 

More from Michiel Stock

Wiskunde voor Waterbeheer
Wiskunde voor WaterbeheerWiskunde voor Waterbeheer
Wiskunde voor Waterbeheer
Michiel Stock
 
How the mathematics behind Netflix will save the world
How the mathematics behind Netflix will save the worldHow the mathematics behind Netflix will save the world
How the mathematics behind Netflix will save the world
Michiel Stock
 
Disentangling ecological networks using graph embedding methods
Disentangling ecological networks using graph embedding methodsDisentangling ecological networks using graph embedding methods
Disentangling ecological networks using graph embedding methods
Michiel Stock
 
2018 presentation montréal_handouts
2018 presentation montréal_handouts2018 presentation montréal_handouts
2018 presentation montréal_handouts
Michiel Stock
 
A tour in optimal transport
A tour in optimal transportA tour in optimal transport
A tour in optimal transport
Michiel Stock
 
Pairwise Learning for Synthetic Biology
Pairwise Learning for Synthetic BiologyPairwise Learning for Synthetic Biology
Pairwise Learning for Synthetic Biology
Michiel Stock
 
PhD defence pairwise learning
PhD defence pairwise learningPhD defence pairwise learning
PhD defence pairwise learning
Michiel Stock
 
Bioscience engineering together: participating at iGEM
Bioscience engineering together: participating at iGEMBioscience engineering together: participating at iGEM
Bioscience engineering together: participating at iGEM
Michiel Stock
 
Exact and efficient top-K inference for multi-target prediction by querying s...
Exact and efficient top-K inference for multi-target prediction by querying s...Exact and efficient top-K inference for multi-target prediction by querying s...
Exact and efficient top-K inference for multi-target prediction by querying s...
Michiel Stock
 
Poster genome engineering & Synthetic Biology 2016
Poster genome engineering & Synthetic Biology 2016Poster genome engineering & Synthetic Biology 2016
Poster genome engineering & Synthetic Biology 2016
Michiel Stock
 
A two-step method to incorporate task features for large output spaces
A two-step method to incorporate task features for large output spacesA two-step method to incorporate task features for large output spaces
A two-step method to incorporate task features for large output spaces
Michiel Stock
 
Kernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyKernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyMichiel Stock
 
A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...
A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...
A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...Michiel Stock
 
Bioinformatics kernels relations
Bioinformatics kernels relationsBioinformatics kernels relations
Bioinformatics kernels relationsMichiel Stock
 

More from Michiel Stock (14)

Wiskunde voor Waterbeheer
Wiskunde voor WaterbeheerWiskunde voor Waterbeheer
Wiskunde voor Waterbeheer
 
How the mathematics behind Netflix will save the world
How the mathematics behind Netflix will save the worldHow the mathematics behind Netflix will save the world
How the mathematics behind Netflix will save the world
 
Disentangling ecological networks using graph embedding methods
Disentangling ecological networks using graph embedding methodsDisentangling ecological networks using graph embedding methods
Disentangling ecological networks using graph embedding methods
 
2018 presentation montréal_handouts
2018 presentation montréal_handouts2018 presentation montréal_handouts
2018 presentation montréal_handouts
 
A tour in optimal transport
A tour in optimal transportA tour in optimal transport
A tour in optimal transport
 
Pairwise Learning for Synthetic Biology
Pairwise Learning for Synthetic BiologyPairwise Learning for Synthetic Biology
Pairwise Learning for Synthetic Biology
 
PhD defence pairwise learning
PhD defence pairwise learningPhD defence pairwise learning
PhD defence pairwise learning
 
Bioscience engineering together: participating at iGEM
Bioscience engineering together: participating at iGEMBioscience engineering together: participating at iGEM
Bioscience engineering together: participating at iGEM
 
Exact and efficient top-K inference for multi-target prediction by querying s...
Exact and efficient top-K inference for multi-target prediction by querying s...Exact and efficient top-K inference for multi-target prediction by querying s...
Exact and efficient top-K inference for multi-target prediction by querying s...
 
Poster genome engineering & Synthetic Biology 2016
Poster genome engineering & Synthetic Biology 2016Poster genome engineering & Synthetic Biology 2016
Poster genome engineering & Synthetic Biology 2016
 
A two-step method to incorporate task features for large output spaces
A two-step method to incorporate task features for large output spacesA two-step method to incorporate task features for large output spaces
A two-step method to incorporate task features for large output spaces
 
Kernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational BiologyKernel Methods and Relational Learning in Computational Biology
Kernel Methods and Relational Learning in Computational Biology
 
A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...
A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...
A Kernel Based Framework for Predicting Interactions Between Methanotrophs an...
 
Bioinformatics kernels relations
Bioinformatics kernels relationsBioinformatics kernels relations
Bioinformatics kernels relations
 

Recently uploaded

THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
Sérgio Sacani
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
aishnasrivastava
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
SAMIR PANDA
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 

Recently uploaded (20)

THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...Multi-source connectivity as the driver of solar wind variability in the heli...
Multi-source connectivity as the driver of solar wind variability in the heli...
 
Structural Classification Of Protein (SCOP)
Structural Classification Of Protein  (SCOP)Structural Classification Of Protein  (SCOP)
Structural Classification Of Protein (SCOP)
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
Seminar of U.V. Spectroscopy by SAMIR PANDA
 Seminar of U.V. Spectroscopy by SAMIR PANDA Seminar of U.V. Spectroscopy by SAMIR PANDA
Seminar of U.V. Spectroscopy by SAMIR PANDA
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 

Enzyme Annotation using Conditional Ranking Algorithms

  • 1. Enzyme Annotation using Conditional Ranking Algorithms Michiel Stock Faculty of Bioscience Engineering Ghent University 6th of June 2014 KERMIT Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 1 / 14
  • 2. Outline 1 From Structure to Function 2 Ranking Enzymes 3 Learning to Rank 4 Results 5 Conclusion Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 2 / 14
  • 3. From Structure to Function What bioinformatics is (often) about Bioinformatics for proteins Using biological knowledge and statistical models to map information from a low level (e.g. protein structure) to a higher level (e.g. molecular function). Sequence Structure Function Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 3 / 14
  • 4. From Structure to Function The data set Data: two data sets of ca. 1600 enzymes with 21 different functions five different similarity measures of the active site active site of an enzyme: Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 4 / 14
  • 5. From Structure to Function The enzyme commission number Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 5 / 14
  • 6. Ranking Enzymes Quantifying enzyme function similarity EC 2.7.7.12 EC 4.2.3.90 EC ?.?.?.? EC 2.7.7.34 EC 4.6.1.11 EC 2.7.1.12 1 0 0 3 0 2 0 2 0 Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 6 / 14
  • 7. Ranking Enzymes Conditional ranking of enzymes Ranking enzymes For an unannotated enzyme, rank the annotated enzymes so that the top has a similar function w.r.t. the query. Minimize ranking error: number of switches needed for a perfect ranking Example: suppose one has an enzyme with unknown function: EC ?.?.?.? 1 EC 2.7.7.12 2 EC 2.7.7.12 3 EC 2.7.7.34 4 EC 2.7.1.12 5 EC 2.7.7.34 6 EC 4.2.3.90 7 EC 1.14.11 8 EC 4.6.1.11 ⇒ EC 2.7.7.12 Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 7 / 14
  • 8. Learning to Rank Learning the catalytic similarity pair of enzymes: e = (v, v ) label ye ∈ {0, 1, 2, 3, 4}: the catalytic similarity five different structural similarities: Kφ(v, v ) A B C D E F G A 4 4 0 0 0 B 4 4 0 0 0 C 0 0 4 2 1 D 0 0 2 4 3 E 0 0 1 3 4 F G Enzymes Enzymes Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 8 / 14
  • 9. Learning to Rank Pairwise features with the Kronecker product ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) Object kernel Pairwise kernel Learning! algorithm … SVM! RLS! … The Kronecker kernel is defined as: KΦ ((v, v ), (¯v, ¯v )) = KΦ (e, ¯e) = Kφ (v, ¯v)Kφ (v , ¯v ) Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 9 / 14
  • 10. Learning to Rank Basic pairwise models Use training data T = {(e, ye)} to fit a model: h(e) = ¯e∈T a¯eKΦ (e, ¯e). The function h ∈ H can be fitted using the following optimisation problem: A(T) = arg min h∈H L(h, T) + λ||h||2 H. For conditional ranking we choose an approximation of the rank loss. This problem has time complexity O(n3), with n the number of enzymes. Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 10 / 14
  • 11. Results Qualitative improvement in the enzyme similarities Example for CavBase structural similarity: Ground truthSupervisedUnsupervised Lighter color = higher similarity Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 11 / 14
  • 12. Results Improvement of the ROC curves ROC curves for the five different structural similarity measures: unsupervised and supervised False positive rate Averagetruepositiverate 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 CB sup. FP sup. LPCS sup. MCS sup. SW sup. CB unsup. FP unsup. LPCS unsup. MCS unsup. SW unsup. ROC curve for the different enzyme similarity measurements of data set I Improvement Increase of AUC from ca. 0.7 to more than 0.8! Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 12 / 14
  • 13. Conclusion General conclusions 1 enzyme function prediction can nicely be cast in a conditional ranking framework 2 supervised ranking is a clear improvement upon the baseline 3 efficient enough for many bioinformatics applications 4 can be generalised to many other settings Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 13 / 14
  • 14. Conclusion Acknowledgements Ghent University Bernard De Baets Willem Waegeman University of Turku Tapio Pahikkala Antti Airola University of Marburg Thomas Fober Eyke H¨ullermeier Want to know more? [1] T. Pahikkala, A. Airola, M. Stock, B. De Baets, and W. Waegeman. Efficient regularized least-squares algorithms for conditional ranking on relational data. Machine Learning, 93(2-3):321–356, 2013. [2] M. Stock, T. Fober, E. H¨ullermeier, S. Glinca, G. Klebe, T. Pahikkala, A. Airola, B. De Baets, and W. Waegeman. Identification of functionally related enzymes by learning-to-rank methods. IEEE Transactions on Computational Biology and Bioinformatics, page Accepted for publication, 2014. Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 14 / 14