Enzyme Annotation using Conditional Ranking Algorithms

•

1 like•715 views

Presentation for the Benelearn conference about the application of conditional ranking algorithms for predicting enzyme function from their structure.

Science

Outline
1 From Structure to Function
2 Ranking Enzymes
3 Learning to Rank
4 Results
5 Conclusion
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 2 / 14

From Structure to Function
What bioinformatics is (often) about
Bioinformatics for proteins
Using biological knowledge and statistical models to map information
from a low level (e.g. protein structure) to a higher level (e.g. molecular
function).
Sequence Structure Function
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 3 / 14

From Structure to Function
The data set
Data:
two data sets of ca. 1600
enzymes with 21
diﬀerent functions
ﬁve diﬀerent similarity
measures of the active
site
active site of an
enzyme:
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 4 / 14

From Structure to Function
The enzyme commission number
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 5 / 14

Ranking Enzymes
Quantifying enzyme function similarity
EC 2.7.7.12
EC 4.2.3.90
EC ?.?.?.?
EC 2.7.7.34
EC 4.6.1.11
EC 2.7.1.12
1
0
0
3
0
2
0
2
0
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 6 / 14

Ranking Enzymes
Conditional ranking of enzymes
Ranking enzymes
For an unannotated enzyme, rank the annotated enzymes so that the
top has a similar function w.r.t. the query.
Minimize ranking error:
number of switches needed
for a perfect ranking
Example: suppose one has an
enzyme with unknown
function: EC ?.?.?.?
1 EC 2.7.7.12
2 EC 2.7.7.12
3 EC 2.7.7.34
4 EC 2.7.1.12
5 EC 2.7.7.34
6 EC 4.2.3.90
7 EC 1.14.11
8 EC 4.6.1.11
⇒ EC 2.7.7.12
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 7 / 14

Learning to Rank
Learning the catalytic similarity
pair of enzymes:
e = (v, v )
label ye ∈ {0, 1, 2, 3, 4}:
the catalytic similarity
ﬁve diﬀerent structural
similarities: Kφ(v, v )
A B C D E F G
A 4 4 0 0 0
B 4 4 0 0 0
C 0 0 4 2 1
D 0 0 2 4 3
E 0 0 1 3 4
F
G
Enzymes
Enzymes
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 8 / 14

Learning to Rank
Pairwise features with the Kronecker product
( , )
( , )
( , )
( , )
( , )
( , )
Object kernel Pairwise kernel
Learning!
algorithm
…
SVM!
RLS!
…
The Kronecker kernel is deﬁned as:
KΦ
((v, v ), (¯v, ¯v )) = KΦ
(e, ¯e) = Kφ
(v, ¯v)Kφ
(v , ¯v )
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 9 / 14

Learning to Rank
Basic pairwise models
Use training data T = {(e, ye)} to ﬁt a model:
h(e) =
¯e∈T
a¯eKΦ
(e, ¯e).
The function h ∈ H can be ﬁtted using the following optimisation problem:
A(T) = arg min
h∈H
L(h, T) + λ||h||2
H.
For conditional ranking we choose an approximation of the rank loss.
This problem has time complexity O(n3), with n the number of enzymes.
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 10 / 14

Results
Qualitative improvement in the enzyme similarities
Example for CavBase structural similarity:
Ground truthSupervisedUnsupervised
Lighter color = higher similarity
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 11 / 14

Results
Improvement of the ROC curves
ROC curves for the ﬁve diﬀerent structural similarity measures:
unsupervised and supervised
False positive rate
Averagetruepositiverate
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
CB sup.
FP sup.
LPCS sup.
MCS sup.
SW sup.
CB unsup.
FP unsup.
LPCS unsup.
MCS unsup.
SW unsup.
ROC curve for the different enzyme similarity
measurements of data set I
Improvement
Increase of AUC from ca. 0.7 to more than 0.8!
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 12 / 14

Conclusion
General conclusions
1 enzyme function prediction can nicely be cast in a conditional ranking
framework
2 supervised ranking is a clear improvement upon the baseline
3 eﬃcient enough for many bioinformatics applications
4 can be generalised to many other settings
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 13 / 14

Conclusion
Acknowledgements
Ghent University
Bernard De Baets
Willem Waegeman
University of Turku
Tapio Pahikkala
Antti Airola
University of Marburg
Thomas Fober
Eyke H¨ullermeier
Want to know more?
[1] T. Pahikkala, A. Airola, M. Stock, B. De Baets, and W. Waegeman. Eﬃcient regularized least-squares algorithms for
conditional ranking on relational data. Machine Learning, 93(2-3):321–356, 2013.
[2] M. Stock, T. Fober, E. H¨ullermeier, S. Glinca, G. Klebe, T. Pahikkala, A. Airola, B. De Baets, and W. Waegeman.
Identiﬁcation of functionally related enzymes by learning-to-rank methods. IEEE Transactions on Computational Biology
and Bioinformatics, page Accepted for publication, 2014.
Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 14 / 14

The Bio2RDF project aims to transform silos of bioinformatics data into a distributed platform for biological knowledge discovery. Initial work focused on building a public database of open-linked data with web-resolvable identifiers that provides information about named entities. This involved a syntactic normalization to convert open data represented in a variety of formats (flatfile, tab, xml, web services) to RDF-based linked data with normalized names (HTTP URIs) and basic typing from source databases. Bio2RDF entities also make reference to other open linked data networks (e.g. dbPedia) thus facilitating traversal across information spaces. However, a significant problem arises when attempting to undertake more sophisticated knowledge discovery approaches such as question answering or symbolic data mining. This is because knowledge is represented in a fundamentally different manner, requiring one to know the underlying data model and reconcile the artefactual differences when they arise. In this talk, we describe our data integration strategy that makes use of both syntactic and semantic normalization to consistently marshal knowledge to a common data model while leveraging explicit logic-based mappings with community ontologies to further enhance the biological knowledgescope.

Novel Approaches to Elucidating Structure Activity Relationships

Christopher Petersen

NLP Data Cleansing Based on Linguistic Ontology Constraints

Dimitris Kontokostas

Slides for the following paper: NLP Data Cleansing Based on Linguistic Ontology Constraints Abstract: Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. However, the varying quality of published data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data quality assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is – compared to other domains, such as biology – a late Linked Data adopter. However, it has seen a steep rise of activity in the creation of data and ontologies. NLP data quality assessment has become an important need for NLP datasets. In our study, we analysed 11 datasets using the lemon and NIF vocabularies in 277 test cases and point out common quality issues.

EMBL-EBISayma Zerin

The biomedical literature captures the most current biomedical knowledge and is a tremendously rich resource for research. With over 24 million publications currently indexed in the US National Library of Medicine’s PubMed index, however, it is becoming increasingly challenging for biomedical researchers to keep up with this literature. Automated strategies for extracting information from it are required. Large-scale processing of the literature enables direct biomedical knowledge discovery. In this presentation, I will introduce the use of text mining techniques to support analysis of biological data sets, and will specifically discuss applications in protein function and phenotype prediction, exploring the integration of literature data with complementary structured resources.

EFFICIENT EXECUTION METHODS OF PIVOTING FOR BULK EXTRACTION OF ENTITY-ATTRIBU...

I3E Technologies

How can you access PubChem programmatically?

Sunghwan Kim

Presented at the 255th American Chemical Society (ACS) National Meeting in New Orleans, LA (March. 19, 2018). Building automated workflows that exploit the vast amount of data contained in PubChem requires programmatic access to the data through application programming interfaces (APIs). PubChem provides several programmatic access routes to its data, including Entrez Utilities (E-Utilities or E-Utils), PubChem Power User Gateway (PUG), PUG-SOAP, PUG-REST, PUG-View, and a REST-ful interface to PubChemRDF. This presentation provides an overview of these programmatic access tools, including recent updates, limitations, usage policies, and best practices. *References* (1) PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem, Nucleic Acids Research, 2015, 43(W1):W605–W611. https://doi.org/10.1093/nar/gkv396 (2) An update on PUG-REST: RESTful interface for programmatic access to PubChem, Nucleic Acids Research, 2018, 46(W1):gky294. https://doi.org/10.1093/nar/gky294

EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...

ChemAxon

Ucl120810

Philip Bourne

JEVBase: An Interactive Resource for Protein Annotationof JE Virus

CSCJournals

Databases containing proteome ic information have become indispensable for virology related studies. Rajendra Memorial Research Institute of Medical Sciences (RMRIMS) has compiled and maintained a functional and molecular annotation database (http://www.jevbase.biomedinformri.org) commonly referred to as JEVBase. This database facilitates significant relationship between molecular analysis, cleavage sites, possible protein functional families assigned to different proteins of Japanese encephalitis virus (JEV). Identification of different protein functions and molecular analysis facilitates a mechanistic understanding of (JEV) infection and opens novel means for drug development. JEVBase database aims to be a resource for scientists working on JE virus

Wiskunde voor Waterbeheer

Michiel Stock

How the mathematics behind Netflix will save the world

Michiel Stock

Disentangling ecological networks using graph embedding methods

Michiel Stock

2018 presentation montréal_handouts

Michiel Stock

A tour in optimal transport

Michiel Stock

Pairwise Learning for Synthetic Biology

Michiel Stock

PhD defence pairwise learning

Michiel Stock

Bioscience engineering together: participating at iGEM

Michiel Stock

Exact and efficient top-K inference for multi-target prediction by querying s...

Michiel Stock

Many complex multi-target prediction problems that concern large target spaces are characterised by a need for efficient prediction strategies that avoid the computation of predictions for all targets explicitly. Examples of such problems emerge in several subfields of machine learning, such as collaborative filtering, multi-label classification, dyadic prediction and biological network inference. In this article we analyse efficient and exact algorithms for computing the top-$K$ predictions in the above problem settings, using a general class of models that we refer to as separable linear relational models. We show how to use those inference algorithms, which are modifications of well-known information retrieval methods, in a variety of machine learning settings. Furthermore, we study the possibility of scoring items incompletely, while still retaining an exact top-$K$ retrieval. Experimental results in several application domains reveal that the so-called threshold algorithm is very scalable, performing often many orders of magnitude more efficiently than the naive approach.

Poster genome engineering & Synthetic Biology 2016

Michiel Stock

A two-step method to incorporate task features for large output spaces

Michiel Stock

Relational learning, predicting properties of dyads, can be seen as an umbrella embodying machine learning problems such as matrix completion, multi-task learning, transfer learning, network prediction and zero-shot learning. Kronecker kernels-based learning algorithms represent a dyad as a structured object and thus provide a computationally efficient and theoretically well-founded framework to tackle these problems. As an alternative to this pairwise feature representation, a two-step approach was suggested that sequentially combines the knowledge from the two domains. This new stepwise method allows us to construct a novel algorithm for dealing with very large datasets in an online fashion. We illustrate experimentally that our method can not only improve performance of a very large-scale multi-class classification, but can also generalize to completely new classes.

Similar to Enzyme Annotation using Conditional Ranking Algorithms

Molecular modelling for in silico drug discovery

Lee Larcombe

Presentation materialbutest

European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...

ExternalEvents

SJA CV Training Certificates and Work Presentation 15-Apr-18

Shareef Jarvi Antar

Morphit introduction

The Edge Software Consultancy Ltd

Function and Phenotype Prediction through Data and Knowledge Fusion

Karin Verspoor

EFFICIENT EXECUTION METHODS OF PIVOTING FOR BULK EXTRACTION OF ENTITY-ATTRIBU...

I3E Technologies

How can you access PubChem programmatically?

Sunghwan Kim

EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...

ChemAxon

Ucl120810

Philip Bourne

JEVBase: An Interactive Resource for Protein Annotationof JE Virus

CSCJournals

Similar to Enzyme Annotation using Conditional Ranking Algorithms (11)

Molecular modelling for in silico drug discovery

Presentation material

European Molecular Biology Laboratory (EMBL)- European Bioinformatics Institu...

SJA CV Training Certificates and Work Presentation 15-Apr-18

Morphit introduction

Function and Phenotype Prediction through Data and Knowledge Fusion

EFFICIENT EXECUTION METHODS OF PIVOTING FOR BULK EXTRACTION OF ENTITY-ATTRIBU...

How can you access PubChem programmatically?

EUGM 2013 - Ian Berry, Bob Marmon (Evotec): Classification and analysis of 21...

Ucl120810

JEVBase: An Interactive Resource for Protein Annotationof JE Virus

Recently uploaded

THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.

Sérgio Sacani

The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.

erythropoiesis-I_mechanism& clinical significance.pptx

muralinath2

EY - Supply Chain Services 2018_template.pptx

AlguinaldoKong

(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...

Scintica Instrumentation

Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes. In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.

Lateral Ventricles.pdf very easy good diagrams comprehensive

silvermistyshot

Cancer cell metabolism: special Reference to Lactate Pathway

AADYARAJPANDEY1

Normal Cell Metabolism: Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function. Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released. Cell utilize energy in the form of ATP. The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process. Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP. The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos). It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation. If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use. IN CANCER CELL: Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful. Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation. This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive. Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful. Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation. This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive. introduction to WARBERG PHENOMENA: WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside. Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme. WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.

platelets_clotting_biogenesis.clot retractionpptx

muralinath2

In silico drugs analogue design: novobiocin analogues.pptx

AlaminAfendy1

Multi-source connectivity as the driver of solar wind variability in the heli...

Sérgio Sacani

The ambient solar wind that flls the heliosphere originates from multiple sources in the solar corona and is highly structured. It is often described as high-speed, relatively homogeneous, plasma streams from coronal holes and slow-speed, highly variable, streams whose source regions are under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify solar wind sources and understand what drives the complexity seen in the heliosphere. By combining magnetic feld modelling and spectroscopic techniques with high-resolution observations and measurements, we show that the solar wind variability detected in situ by Solar Orbiter in March 2022 is driven by spatio-temporal changes in the magnetic connectivity to multiple sources in the solar atmosphere. The magnetic feld footpoints connected to the spacecraft moved from the boundaries of a coronal hole to one active region (12961) and then across to another region (12957). This is refected in the in situ measurements, which show the transition from fast to highly Alfvénic then to slow solar wind that is disrupted by the arrival of a coronal mass ejection. Our results describe solar wind variability at 0.5 au but are applicable to near-Earth observatories.

Structural Classification Of Protein (SCOP)

aishnasrivastava

A brief information about the SCOP protein database used in bioinformatics. The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...

Ana Luísa Pinho

Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.

The ASGCT Annual Meeting was packed with exciting progress in the field advan...

Health Advances

Seminar of U.V. Spectroscopy by SAMIR PANDA

SAMIR PANDA

Citrus Greening Disease and its Management

subedisuryaofficial

role of pramana in research.pptx in science

sonaliswain16

What is greenhouse gasses and how many gasses are there to affect the Earth.

moosaasad1975

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...

Sérgio Sacani

We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and 30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1 . Our search finds no candidates at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to infer the properties of the evolving luminosity function without binning in redshift or luminosity that marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results, and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5 from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical models for evolution of the dark matter halo mass function.

Hemoglobin metabolism_pathophysiology.pptx

muralinath2

NuGOweek 2024 Ghent - programme - final version

pablovgd

Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx

muralinath2

Recently uploaded (20)

THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.

erythropoiesis-I_mechanism& clinical significance.pptx

EY - Supply Chain Services 2018_template.pptx

(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...

Lateral Ventricles.pdf very easy good diagrams comprehensive

Cancer cell metabolism: special Reference to Lactate Pathway

platelets_clotting_biogenesis.clot retractionpptx

In silico drugs analogue design: novobiocin analogues.pptx

Multi-source connectivity as the driver of solar wind variability in the heli...

Structural Classification Of Protein (SCOP)

Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...

The ASGCT Annual Meeting was packed with exciting progress in the field advan...

Seminar of U.V. Spectroscopy by SAMIR PANDA

Citrus Greening Disease and its Management

role of pramana in research.pptx in science

What is greenhouse gasses and how many gasses are there to affect the Earth.

Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...

Hemoglobin metabolism_pathophysiology.pptx

NuGOweek 2024 Ghent - programme - final version

Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx

Enzyme Annotation using Conditional Ranking Algorithms

1. Enzyme Annotation using Conditional Ranking Algorithms Michiel Stock Faculty of Bioscience Engineering Ghent University 6th of June 2014 KERMIT Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 1 / 14

2. Outline 1 From Structure to Function 2 Ranking Enzymes 3 Learning to Rank 4 Results 5 Conclusion Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 2 / 14

3. From Structure to Function What bioinformatics is (often) about Bioinformatics for proteins Using biological knowledge and statistical models to map information from a low level (e.g. protein structure) to a higher level (e.g. molecular function). Sequence Structure Function Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 3 / 14

4. From Structure to Function The data set Data: two data sets of ca. 1600 enzymes with 21 different functions five different similarity measures of the active site active site of an enzyme: Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 4 / 14

5. From Structure to Function The enzyme commission number Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 5 / 14

6. Ranking Enzymes Quantifying enzyme function similarity EC 2.7.7.12 EC 4.2.3.90 EC ?.?.?.? EC 2.7.7.34 EC 4.6.1.11 EC 2.7.1.12 1 0 0 3 0 2 0 2 0 Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 6 / 14

7. Ranking Enzymes Conditional ranking of enzymes Ranking enzymes For an unannotated enzyme, rank the annotated enzymes so that the top has a similar function w.r.t. the query. Minimize ranking error: number of switches needed for a perfect ranking Example: suppose one has an enzyme with unknown function: EC ?.?.?.? 1 EC 2.7.7.12 2 EC 2.7.7.12 3 EC 2.7.7.34 4 EC 2.7.1.12 5 EC 2.7.7.34 6 EC 4.2.3.90 7 EC 1.14.11 8 EC 4.6.1.11 ⇒ EC 2.7.7.12 Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 7 / 14

8. Learning to Rank Learning the catalytic similarity pair of enzymes: e = (v, v ) label ye ∈ {0, 1, 2, 3, 4}: the catalytic similarity ﬁve diﬀerent structural similarities: Kφ(v, v ) A B C D E F G A 4 4 0 0 0 B 4 4 0 0 0 C 0 0 4 2 1 D 0 0 2 4 3 E 0 0 1 3 4 F G Enzymes Enzymes Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 8 / 14

9. Learning to Rank Pairwise features with the Kronecker product ( , ) ( , ) ( , ) ( , ) ( , ) ( , ) Object kernel Pairwise kernel Learning! algorithm … SVM! RLS! … The Kronecker kernel is deﬁned as: KΦ ((v, v ), (¯v, ¯v )) = KΦ (e, ¯e) = Kφ (v, ¯v)Kφ (v , ¯v ) Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 9 / 14

10. Learning to Rank Basic pairwise models Use training data T = {(e, ye)} to ﬁt a model: h(e) = ¯e∈T a¯eKΦ (e, ¯e). The function h ∈ H can be ﬁtted using the following optimisation problem: A(T) = arg min h∈H L(h, T) + λ||h||2 H. For conditional ranking we choose an approximation of the rank loss. This problem has time complexity O(n3), with n the number of enzymes. Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 10 / 14

11. Results Qualitative improvement in the enzyme similarities Example for CavBase structural similarity: Ground truthSupervisedUnsupervised Lighter color = higher similarity Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 11 / 14

12. Results Improvement of the ROC curves ROC curves for the ﬁve diﬀerent structural similarity measures: unsupervised and supervised False positive rate Averagetruepositiverate 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 CB sup. FP sup. LPCS sup. MCS sup. SW sup. CB unsup. FP unsup. LPCS unsup. MCS unsup. SW unsup. ROC curve for the different enzyme similarity measurements of data set I Improvement Increase of AUC from ca. 0.7 to more than 0.8! Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 12 / 14

13. Conclusion General conclusions 1 enzyme function prediction can nicely be cast in a conditional ranking framework 2 supervised ranking is a clear improvement upon the baseline 3 eﬃcient enough for many bioinformatics applications 4 can be generalised to many other settings Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 13 / 14

14. Conclusion Acknowledgements Ghent University Bernard De Baets Willem Waegeman University of Turku Tapio Pahikkala Antti Airola University of Marburg Thomas Fober Eyke Hüllermeier Want to know more? [1] T. Pahikkala, A. Airola, M. Stock, B. De Baets, and W. Waegeman. Efficient regularized least-squares algorithms for conditional ranking on relational data. Machine Learning, 93(2-3):321–356, 2013. [2] M. Stock, T. Fober, E. Hüllermeier, S. Glinca, G. Klebe, T. Pahikkala, A. Airola, B. De Baets, and W. Waegeman. Identification of functionally related enzymes by learning-to-rank methods. IEEE Transactions on Computational Biology and Bioinformatics, page Accepted for publication, 2014. Michiel Stock (KERMIT) Conditional Ranking of Enzymes 6th of June 2014 14 / 14

Enzyme Annotation using Conditional Ranking Algorithms

Recommended

Recommended

More Related Content

Similar to Enzyme Annotation using Conditional Ranking Algorithms

Similar to Enzyme Annotation using Conditional Ranking Algorithms (11)

More from Michiel Stock

More from Michiel Stock (14)

Recently uploaded

Recently uploaded (20)

Enzyme Annotation using Conditional Ranking Algorithms