SlideShare a Scribd company logo
1 of 1
Download to read offline
Signals of Evolution: Conservation, Specificity Determining Positions and Coevolution
                                                      Elin      Teppa1,            Diego          Zea 2,       Morten              Nielsen 1 3 and Cristina            Marino        Buslje 1
                                                                                             1 Structural
                                                                                               Bioinformatics Unit, Leloir Institute Foundation
                                                                               2 Structural Bioinformatics Group, National University of Quilmes
                                                      3 Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark




INTRODUCTION                                                                                                                              RESULTS AND DISCUSSION
Protein sequences evolve under several constraints and each constraint leads to a                                                         We calculate the Spearman rank correlation between methods to find out if they
specific pattern of conservation and variation in protein sequences. In this study we                                                     capture different pieces of information (Fig. 1). The analysis shows as expected, a
focused on the analysis of three major evolutionary signals: conservation,                                                                strong correlation between ET rv and conservation. This is because the first
specificity determining positions and coevolution between residues. These signals                                                         includes conservation information in it score. Surprisingly, the correlation between
are the results of different evolutionary mechanisms and have been used by                                                                ET iv, SDPfox and XDET is less than expected. For the first two methods, this can
different bioinformatics methods to predict functionally important sites.                                                                 be understood by the strong dependence of the results of the sequence clustering
                                                                                                                                          method, which are phylogenetically and functionally based for ET iv and SDPfox
Fully conserved position in a Multiple Sequence Alignment (MSA) are interpreted                                                           respectively. On the other hand the prediction of SDPs by XDET is based on the
as important residues for the structure and function of the protein. At the beginning,                                                    comparison of the mutational behavior of a position respect to the family mutational
the computational methods used this information to predict functional important                                                           trend. Such approach may detect important positions (as they have the same
sites including catalytic residues. Nowadays, more factors are taken into account to                                                      behavior than the evolution of the whole family) but this is not enough evidence to
improve the performance of prediction methods.                                                                                            assign their biological importance to the determination of the specificity of the
                                                                                                                                          enzyme.
Other positions show a more subtle pattern of conservation, they are conserved
within a group of sequences (sub-family) but may change in another group. Such                                                            We also analyzed to which extent are overlap the best scores predicted residues
positions are responsible for protein specificity i.e. ligand binding, protein-protein                                                    by the different methods. We take into account the best N scores for each method,
interaction, etc. (named: Specificity-Determining Positions –SDPs- ). The                                                                 were N is equal to 10% of the total length of the sequence. We illustrated in Figure
classification of proteins into groups can be defined according to different criteria                                                     2, the average of the overlapping residues for the 434 families. Except for ET rv
i.e. identity, phylogenetically, functional similarity, among others. SDPs are                                                            with conservation and ET iv, the others methods differ in which residues are most
suggested to be located in the proximity of the catalytic residues in order to carry                                                      important for the family.
out their role of defining the substrate specificity.

Coevolution between residues is another signal that can be extracted from MSAs.
Coevolution is the result of compensatory mutations, namely they are those
residues that have undergone concerted changes to overcome a common
selection pressure. Owing to the limitations on the amino acid diversity in the
proximity of an active site, the catalytic residues carry a particular signature defined
by a close proximity network of residues with high mutual information.

In summary, in this study we consider different methods that attempt to capture
information from three different evolutionary signals. They have in common the
prediction of functionally important sites and are capable of detecting the catalytic
residues or to point the residues nearby the catalytic residues.
Disentangling the function of different positions in an alignment will allows us to
create methods that take profit from different information contained in an
alignment. That could be use for the deeper study of any proteins. Besides it would
help to do better and accurate annotations of proteins with unknown function.                                                             Figure 1 :      Heat map of the Spearman   Figure 2 :   Average percentage residues predicted in
                                                                                                                                          rank correlation between methods           common between methods considering the top 10% ranked
                                                                                                                                                                                     positions.


MATERIALS AND METHODS                                                                                                                    As an example we illustrate in Fig.3 the highest scores of the Phosphofructokinase
                                                                                                                                         1 family mapped in the 3D structure of the reference protein.
The dataset was constructed based on the catalytic site atlas (CSA) database [1]
and Pfam database [2]. A total of 434 proteins families which in turn have 1212
cayalytic residues have been studied.                                                                                                                                                                 Figure 3 :       Mapping of the predicted
For a given family one reference pdb entry was selected and the MSAs were                                                                                                                             functionally important sites using six
                                                                                                                                                                                                      different prediction scores. Plotted is the
prepared removing redundant sequences at the level of 62% identity and trimmimg                                                                                                                       cartoon representation of the PDB:
deletions and insertions across the whole alignment so as to preserve the                                                                                                                             1PFK.The top 10% prediction scores are
continuity of the reference sequence. In addition, all positions with >50% gaps, as                                                                                                                   represented       in green. The catalytic
                                                                                                                                                                                                      residues are show in red sticks, and the
well as sequences covering <50% of the reference sequence length were                                                                                                                                 SDPs known experimentally are show in
removed.                                                                                                                                                                                              blue sticks.


Conservation: It was used the Kullback-Leibler conservation score.                                                                                                                                                            Predictive
                                                                                                                                                                                                         Method
                                                                                                                                                                                                                             performance
Mutual Information: Mutual Information was calculated as describe in [3]. MI                                                                                                                            pC                    0.83899
gives a value for each pair of residues in a MSA. We calculated a cumulative
                                                                                                                                                                                                        pMI                   0.80342
Mutual Information score (cMI) for each residue as the sum of MI values above
certain threshold of every amino acid pair where the particular residue appears.                                                                                                                        pET rv                0.86774
                                                                                                                                                                                                        pET iv                0.63360
Evolutionary Tracing: The ET method identified invariant specific residues by                                                                                                                           pSDPfox               0.63602
partitioning the phylogenetic tree into subgroups of similar sequences [4]. ET iv
score represents conservation within groups in a qualitative way and predicts                                                                                                                         Table 1 :      Predictive perfomance for
SDPs; whereas ET rv score incorporate entropy as a quantitative measure of                                                                                                                            detecting catalytic residues in terms to
conservation giving a rank of positions by their relative importance.                                                                                                                                 AUC value on the 434 Pfam entries.


SDPfox: This method predicts SDPs in a phylogeny-independent manner. At first it
                                                                                                                                          We demonstrate that the methods capture different information and identify with
performs an identification of specificity groups through assign each protein to a
                                                                                                                                          the highest scores different residues positions. An exception is ET rv scores that
group by iterations till convergence. This classification allows the prediction of
                                                                                                                                          shows a strong correlation with conservation.
SDPs that end up separated on a phylogenetic tree [5].
                                                                                                                                          pET rv, pCons and pMI scores have shown a good performance to detect catalytic
XDET: This method implements the mutational behaviour algorithm based on the                                                              residues. However, only pMI could be combined with other scores to improve the
comparison of the mutational behaviour of a position with the mutational behaviour                                                        prediction of catalytic residues, because this has a low correlation with other
of the whole alignment. The principle is that positions showing a family dependent                                                        measures.
conservation pattern would have a similar mutational behaviour as the whole family
[6].                                                                                                                                      A weakness of the SDPs prediction methods is that some conserved positions
                                                                                                                                          could mask SDPs positions which would be detected if more sequences become
Proximity scores for each method was calculated as the sum of the scores of                                                               available for the family.
residues within a distance ≤ 6Ǻ in the 3D structure to the given amino acid.
The predictive performance in detecting catalytic residues using the proximity                                                            There is a lack of publicly available SDP database, which hinders the direct testing
scores was evaluated in terms of the area under the ROC curve per family.                                                                 of methods for their prediction.

REFERENCES                                                                                                                                The SDP prediction methods even with different approaches, share the use of
                                                                                                                                          conserved amino acids as indicators of likely functional significance. In this context
1 Porter, C.T et al, The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural
data. Nucl. Acids Res., 2004. 32(suppl_1): p. D129-133.
                                                                                                                                          the co-evolution is less representative of the global evolution of a whole family or
2 Finn, R.D., et al., The Pfam protein families database. Nucl. Acids Res., 2008. 36(suppl_1): p. D281-288                                subfamily, thus providing information of specific events that required a common
3 Buslje, C.M., et al., Correction for phylogeny, small number of observations and data redundancy improves the
identification of coevolving amino acid pairs using mutual information. Bioinformatics, 2009. 25(9): p. 1125-1131.                        adaptation of two or more residues and can be detected even in phylogenetically
4 Lichtarge, O., et al., A family of Evolution-Entropy Hybrid Methods for ranking protein residudes by importance.
J.Mol.Biol, 2004. 336: p. 1265-82.
                                                                                                                                          divergent family.
5 Kalinina O.V. et al., An automated stochastic approach to the identification of the protein specificity determinants and
functional subfamilies. AMB, 2010: p.5-29.
6 Del Sol A. et al., Automatic Methods for Predicting Functionally Important Residues. J.Mol.Biol, 2003.326(4):1289-1302

More Related Content

Viewers also liked

Viewers also liked (9)

About using new descriptors for cheminformatics
About using new descriptors for cheminformaticsAbout using new descriptors for cheminformatics
About using new descriptors for cheminformatics
 
Cooperatividad en la Expresión Génica: Abordaje Estocástico
Cooperatividad en la Expresión Génica: Abordaje EstocásticoCooperatividad en la Expresión Génica: Abordaje Estocástico
Cooperatividad en la Expresión Génica: Abordaje Estocástico
 
Prediction of heparin binding sites on GAPDH
Prediction of heparin binding sites on GAPDHPrediction of heparin binding sites on GAPDH
Prediction of heparin binding sites on GAPDH
 
Modelado de la proteína p35 de toxoplasma gondii
Modelado de la proteína p35 de toxoplasma gondiiModelado de la proteína p35 de toxoplasma gondii
Modelado de la proteína p35 de toxoplasma gondii
 
Structural Order and Disorder Dictate Sequence And Functional Evolution of th...
Structural Order and Disorder Dictate Sequence And Functional Evolution of th...Structural Order and Disorder Dictate Sequence And Functional Evolution of th...
Structural Order and Disorder Dictate Sequence And Functional Evolution of th...
 
Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...
Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...
Discovery Of Functional Protein Linear Motifs Using a Greaddy Algorithm and I...
 
La Unidad de Bioinformática del INTA
La Unidad de Bioinformática del INTALa Unidad de Bioinformática del INTA
La Unidad de Bioinformática del INTA
 
Predicting peptide/MHC interactions: Application to epitope identification an...
Predicting peptide/MHC interactions: Application to epitope identification an...Predicting peptide/MHC interactions: Application to epitope identification an...
Predicting peptide/MHC interactions: Application to epitope identification an...
 
Bioinformatica Proteinas
Bioinformatica ProteinasBioinformatica Proteinas
Bioinformatica Proteinas
 

Similar to Signals of Evolution: Conservation, SDPs, Coevolution

Bioinformatics, application by kk sahu sir
Bioinformatics, application by kk sahu sirBioinformatics, application by kk sahu sir
Bioinformatics, application by kk sahu sirKAUSHAL SAHU
 
Reading circle of Epigenome Roadmap: Roadmap Epigenomics Consortium et. al. I...
Reading circle of Epigenome Roadmap: Roadmap Epigenomics Consortium et. al. I...Reading circle of Epigenome Roadmap: Roadmap Epigenomics Consortium et. al. I...
Reading circle of Epigenome Roadmap: Roadmap Epigenomics Consortium et. al. I...Itoshi Nikaido
 
ConSurf_an_algorithmic_tool_for_the_iden
ConSurf_an_algorithmic_tool_for_the_idenConSurf_an_algorithmic_tool_for_the_iden
ConSurf_an_algorithmic_tool_for_the_idenRony Armon
 
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
 
Role of molecular marker
Role of molecular markerRole of molecular marker
Role of molecular markerShweta Tiwari
 
Friend NIEHS 2013-03-01
Friend NIEHS 2013-03-01Friend NIEHS 2013-03-01
Friend NIEHS 2013-03-01Sage Base
 
Presage database
Presage databasePresage database
Presage databaseAkshay More
 
Particle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationParticle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationEditor IJCATR
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 pptrehman2009
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsmaulikchaudhary8
 
Sophie F. summer Poster Final
Sophie F. summer Poster FinalSophie F. summer Poster Final
Sophie F. summer Poster FinalSophie Friedheim
 
Principle of flexible docking
Principle of flexible dockingPrinciple of flexible docking
Principle of flexible dockinglab13unisa
 
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...Anthony Parziale
 
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Abdelrahman Hosny
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple nadeem akhter
 
BolingerJustin - Honors Thesis
BolingerJustin - Honors ThesisBolingerJustin - Honors Thesis
BolingerJustin - Honors ThesisJustin P. Bolinger
 

Similar to Signals of Evolution: Conservation, SDPs, Coevolution (20)

Bioinformatics, application by kk sahu sir
Bioinformatics, application by kk sahu sirBioinformatics, application by kk sahu sir
Bioinformatics, application by kk sahu sir
 
Reading circle of Epigenome Roadmap: Roadmap Epigenomics Consortium et. al. I...
Reading circle of Epigenome Roadmap: Roadmap Epigenomics Consortium et. al. I...Reading circle of Epigenome Roadmap: Roadmap Epigenomics Consortium et. al. I...
Reading circle of Epigenome Roadmap: Roadmap Epigenomics Consortium et. al. I...
 
ConSurf_an_algorithmic_tool_for_the_iden
ConSurf_an_algorithmic_tool_for_the_idenConSurf_an_algorithmic_tool_for_the_iden
ConSurf_an_algorithmic_tool_for_the_iden
 
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
 
Genomics
Genomics Genomics
Genomics
 
Role of molecular marker
Role of molecular markerRole of molecular marker
Role of molecular marker
 
Friend NIEHS 2013-03-01
Friend NIEHS 2013-03-01Friend NIEHS 2013-03-01
Friend NIEHS 2013-03-01
 
Encode Project
Encode ProjectEncode Project
Encode Project
 
Presage database
Presage databasePresage database
Presage database
 
Austin Neurology & Neurosciences
Austin Neurology & NeurosciencesAustin Neurology & Neurosciences
Austin Neurology & Neurosciences
 
Particle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster IdentificationParticle Swarm Optimization for Gene cluster Identification
Particle Swarm Optimization for Gene cluster Identification
 
Chapter 20 ppt
Chapter 20 pptChapter 20 ppt
Chapter 20 ppt
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
phy prAC.pptx
phy prAC.pptxphy prAC.pptx
phy prAC.pptx
 
Sophie F. summer Poster Final
Sophie F. summer Poster FinalSophie F. summer Poster Final
Sophie F. summer Poster Final
 
Principle of flexible docking
Principle of flexible dockingPrinciple of flexible docking
Principle of flexible docking
 
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
Confirming DNA Replication Origins of Saccharomyces Cerevisiae A Deep Learnin...
 
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
Confirming dna replication origins of saccharomyces cerevisiae a deep learnin...
 
bioinformatics simple
bioinformatics simple bioinformatics simple
bioinformatics simple
 
BolingerJustin - Honors Thesis
BolingerJustin - Honors ThesisBolingerJustin - Honors Thesis
BolingerJustin - Honors Thesis
 

More from Asociación Argentina de Bioinformática y Biología Computacional

More from Asociación Argentina de Bioinformática y Biología Computacional (8)

Design of degenerated primers from bioinformatics online software for putativ...
Design of degenerated primers from bioinformatics online software for putativ...Design of degenerated primers from bioinformatics online software for putativ...
Design of degenerated primers from bioinformatics online software for putativ...
 
A structure-function analysis of s HSPs in plants
A structure-function analysis of s HSPs in plantsA structure-function analysis of s HSPs in plants
A structure-function analysis of s HSPs in plants
 
Data balancing for phenotype classification based on SNPs
Data balancing for phenotype classification based on SNPsData balancing for phenotype classification based on SNPs
Data balancing for phenotype classification based on SNPs
 
Gene selection via significant subset using silhouette index
Gene selection via significant subset using silhouette indexGene selection via significant subset using silhouette index
Gene selection via significant subset using silhouette index
 
Bolstered error estimation for discrete classifier applied to genomic signal ...
Bolstered error estimation for discrete classifier applied to genomic signal ...Bolstered error estimation for discrete classifier applied to genomic signal ...
Bolstered error estimation for discrete classifier applied to genomic signal ...
 
Biopython: Overview, State of the Art and Outlook
Biopython: Overview, State of the Art and OutlookBiopython: Overview, State of the Art and Outlook
Biopython: Overview, State of the Art and Outlook
 
¿Cuál es la estabilidad relevante de las proteínas?
¿Cuál es la estabilidad relevante de las proteínas?¿Cuál es la estabilidad relevante de las proteínas?
¿Cuál es la estabilidad relevante de las proteínas?
 
Biogeografía histórica y Análisis de Vicarianza: Una perspectiva computacional
Biogeografía histórica y Análisis de Vicarianza: Una perspectiva computacionalBiogeografía histórica y Análisis de Vicarianza: Una perspectiva computacional
Biogeografía histórica y Análisis de Vicarianza: Una perspectiva computacional
 

Recently uploaded

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Signals of Evolution: Conservation, SDPs, Coevolution

  • 1. Signals of Evolution: Conservation, Specificity Determining Positions and Coevolution Elin Teppa1, Diego Zea 2, Morten Nielsen 1 3 and Cristina Marino Buslje 1 1 Structural Bioinformatics Unit, Leloir Institute Foundation 2 Structural Bioinformatics Group, National University of Quilmes 3 Center for Biological Sequence Analysis, Department of Systems Biology, The Technical University of Denmark INTRODUCTION RESULTS AND DISCUSSION Protein sequences evolve under several constraints and each constraint leads to a We calculate the Spearman rank correlation between methods to find out if they specific pattern of conservation and variation in protein sequences. In this study we capture different pieces of information (Fig. 1). The analysis shows as expected, a focused on the analysis of three major evolutionary signals: conservation, strong correlation between ET rv and conservation. This is because the first specificity determining positions and coevolution between residues. These signals includes conservation information in it score. Surprisingly, the correlation between are the results of different evolutionary mechanisms and have been used by ET iv, SDPfox and XDET is less than expected. For the first two methods, this can different bioinformatics methods to predict functionally important sites. be understood by the strong dependence of the results of the sequence clustering method, which are phylogenetically and functionally based for ET iv and SDPfox Fully conserved position in a Multiple Sequence Alignment (MSA) are interpreted respectively. On the other hand the prediction of SDPs by XDET is based on the as important residues for the structure and function of the protein. At the beginning, comparison of the mutational behavior of a position respect to the family mutational the computational methods used this information to predict functional important trend. Such approach may detect important positions (as they have the same sites including catalytic residues. Nowadays, more factors are taken into account to behavior than the evolution of the whole family) but this is not enough evidence to improve the performance of prediction methods. assign their biological importance to the determination of the specificity of the enzyme. Other positions show a more subtle pattern of conservation, they are conserved within a group of sequences (sub-family) but may change in another group. Such We also analyzed to which extent are overlap the best scores predicted residues positions are responsible for protein specificity i.e. ligand binding, protein-protein by the different methods. We take into account the best N scores for each method, interaction, etc. (named: Specificity-Determining Positions –SDPs- ). The were N is equal to 10% of the total length of the sequence. We illustrated in Figure classification of proteins into groups can be defined according to different criteria 2, the average of the overlapping residues for the 434 families. Except for ET rv i.e. identity, phylogenetically, functional similarity, among others. SDPs are with conservation and ET iv, the others methods differ in which residues are most suggested to be located in the proximity of the catalytic residues in order to carry important for the family. out their role of defining the substrate specificity. Coevolution between residues is another signal that can be extracted from MSAs. Coevolution is the result of compensatory mutations, namely they are those residues that have undergone concerted changes to overcome a common selection pressure. Owing to the limitations on the amino acid diversity in the proximity of an active site, the catalytic residues carry a particular signature defined by a close proximity network of residues with high mutual information. In summary, in this study we consider different methods that attempt to capture information from three different evolutionary signals. They have in common the prediction of functionally important sites and are capable of detecting the catalytic residues or to point the residues nearby the catalytic residues. Disentangling the function of different positions in an alignment will allows us to create methods that take profit from different information contained in an alignment. That could be use for the deeper study of any proteins. Besides it would help to do better and accurate annotations of proteins with unknown function. Figure 1 : Heat map of the Spearman Figure 2 : Average percentage residues predicted in rank correlation between methods common between methods considering the top 10% ranked positions. MATERIALS AND METHODS As an example we illustrate in Fig.3 the highest scores of the Phosphofructokinase 1 family mapped in the 3D structure of the reference protein. The dataset was constructed based on the catalytic site atlas (CSA) database [1] and Pfam database [2]. A total of 434 proteins families which in turn have 1212 cayalytic residues have been studied. Figure 3 : Mapping of the predicted For a given family one reference pdb entry was selected and the MSAs were functionally important sites using six different prediction scores. Plotted is the prepared removing redundant sequences at the level of 62% identity and trimmimg cartoon representation of the PDB: deletions and insertions across the whole alignment so as to preserve the 1PFK.The top 10% prediction scores are continuity of the reference sequence. In addition, all positions with >50% gaps, as represented in green. The catalytic residues are show in red sticks, and the well as sequences covering <50% of the reference sequence length were SDPs known experimentally are show in removed. blue sticks. Conservation: It was used the Kullback-Leibler conservation score. Predictive Method performance Mutual Information: Mutual Information was calculated as describe in [3]. MI pC 0.83899 gives a value for each pair of residues in a MSA. We calculated a cumulative pMI 0.80342 Mutual Information score (cMI) for each residue as the sum of MI values above certain threshold of every amino acid pair where the particular residue appears. pET rv 0.86774 pET iv 0.63360 Evolutionary Tracing: The ET method identified invariant specific residues by pSDPfox 0.63602 partitioning the phylogenetic tree into subgroups of similar sequences [4]. ET iv score represents conservation within groups in a qualitative way and predicts Table 1 : Predictive perfomance for SDPs; whereas ET rv score incorporate entropy as a quantitative measure of detecting catalytic residues in terms to conservation giving a rank of positions by their relative importance. AUC value on the 434 Pfam entries. SDPfox: This method predicts SDPs in a phylogeny-independent manner. At first it We demonstrate that the methods capture different information and identify with performs an identification of specificity groups through assign each protein to a the highest scores different residues positions. An exception is ET rv scores that group by iterations till convergence. This classification allows the prediction of shows a strong correlation with conservation. SDPs that end up separated on a phylogenetic tree [5]. pET rv, pCons and pMI scores have shown a good performance to detect catalytic XDET: This method implements the mutational behaviour algorithm based on the residues. However, only pMI could be combined with other scores to improve the comparison of the mutational behaviour of a position with the mutational behaviour prediction of catalytic residues, because this has a low correlation with other of the whole alignment. The principle is that positions showing a family dependent measures. conservation pattern would have a similar mutational behaviour as the whole family [6]. A weakness of the SDPs prediction methods is that some conserved positions could mask SDPs positions which would be detected if more sequences become Proximity scores for each method was calculated as the sum of the scores of available for the family. residues within a distance ≤ 6Ǻ in the 3D structure to the given amino acid. The predictive performance in detecting catalytic residues using the proximity There is a lack of publicly available SDP database, which hinders the direct testing scores was evaluated in terms of the area under the ROC curve per family. of methods for their prediction. REFERENCES The SDP prediction methods even with different approaches, share the use of conserved amino acids as indicators of likely functional significance. In this context 1 Porter, C.T et al, The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Nucl. Acids Res., 2004. 32(suppl_1): p. D129-133. the co-evolution is less representative of the global evolution of a whole family or 2 Finn, R.D., et al., The Pfam protein families database. Nucl. Acids Res., 2008. 36(suppl_1): p. D281-288 subfamily, thus providing information of specific events that required a common 3 Buslje, C.M., et al., Correction for phylogeny, small number of observations and data redundancy improves the identification of coevolving amino acid pairs using mutual information. Bioinformatics, 2009. 25(9): p. 1125-1131. adaptation of two or more residues and can be detected even in phylogenetically 4 Lichtarge, O., et al., A family of Evolution-Entropy Hybrid Methods for ranking protein residudes by importance. J.Mol.Biol, 2004. 336: p. 1265-82. divergent family. 5 Kalinina O.V. et al., An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies. AMB, 2010: p.5-29. 6 Del Sol A. et al., Automatic Methods for Predicting Functionally Important Residues. J.Mol.Biol, 2003.326(4):1289-1302