From ELMs to function: interaction networks and feature spaces

416 views
392 views

Published on

7th ELM meeting, ISG Hotel, Heidelberg, Germany, February 28, 2004

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
416
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

From ELMs to function: interaction networks and feature spaces

  1. 1. From ELMs to Function: Interaction Networks and Feature Spaces Lars Juhl Jensen EMBL Heidelberg
  2. 2. Function unknown for 40% of human proteins
  3. 3. 1AOZ (129 aa) vs. 1PLC (99 aa) scoring matrix: BLOSUM50, gap penalties: -12/-2 15.5% identity; Global alignment score: -23 10 20 30 40 50 60 1AOZ SQIRHYKWEVEYMFWAPNCNENIVMGINGQFPGPTIRANAGDSVVVELTNKLHTEGVVIH .. .. : ... . . ..: . :...: . .: ...:. 1PLC ---------IDVLLGA---DDGSLAFVPSEFS-----ISPGEKIVFK-NNAGFPHNIVFD 10 20 30 40 70 80 90 100 110 120 1AOZ WHGILQRGTPWADGTASISQCAINPGETFFYNFTVDNPGTFFYHGHLGMQRSAGLYGSLI .: :. . . : . :::: .. . .:. : : ::. :.. 1 PLC EDSI-PSGVDASKISMSEEDLLNAKGETFEVALSNKGEYSFYCSPHQG----AGMVGKVT 50 60 70 80 90 1AOZ VDPPQGKKE :. 1PLC VN-------
  4. 4. Structural similarity can be deceiving: Two structures from the Cupredoxin superfamily Enzyme Non-enzyme
  5. 5. ProtFun: Prediction of protein function from post-translational modifications
  6. 6. Protein features determine function # Functional category 1AOZ 1PLC Amino_acid_biosynthesis 0.126 0.070 Biosynthesis_of_cofactors 0.100 0.075 Cell_envelope 0.429 0.032 Cellular_processes 0.057 0.059 Central_intermediary_metabolism 0.063 0.041 Energy_metabolism 0.126 0.268 Fatty_acid_metabolism 0.027 0.072 Purines_and_pyrimidines 0.439 0.088 Regulatory_functions 0.102 0.019 Replication_and_transcription 0.052 0.089 Translation 0.079 0.150 Transport_and_binding 0.032 0.052 # Enzyme/nonenzyme Enzyme 0.773 0.310 Nonenzyme 0.227 0.690 # Enzyme class Oxidoreductase (EC 1.-.-.-) 0.077 0.077 Transferase (EC 2.-.-.-) 0.260 0.099 Hydrolase (EC 3.-.-.-) 0.114 0.071 Lyase (EC 4.-.-.-) 0.025 0.020 Isomerase (EC 5.-.-.-) 0.010 0.068 Ligase (EC 6.-.-.-) 0.017 0.017
  7. 7. Feature-function correlations <ul><li>Transmembrane helices predictive of </li></ul><ul><ul><li>Receptors </li></ul></ul><ul><ul><li>Transporters </li></ul></ul><ul><ul><li>Ion channels </li></ul></ul><ul><li>Subcellular localization </li></ul><ul><ul><li>Receptors </li></ul></ul><ul><ul><li>Transcription (regulation) </li></ul></ul><ul><li>S/T-phosphorylation </li></ul><ul><ul><li>Transcription regulation </li></ul></ul>
  8. 8. ELMer hunting Bugs: “Heeeey, there's something awfly scwewy going on awound here” <ul><li>The idea: compare GO annotation of ELMs with GO term of ELM containing proteins </li></ul><ul><ul><li>Color shows the correlation between a GO term and ELM matches </li></ul></ul><ul><ul><li>Black dots denote annotated GO terms </li></ul></ul><ul><li>Lack of correlations need not be a problem </li></ul><ul><li>But how come ... </li></ul><ul><ul><li>LIG_Dynein_DLC8_1 is not annotated as intracellular protein transport? </li></ul></ul><ul><ul><li>LIG_TRP is not stress response? </li></ul></ul><ul><ul><li>LIG_WRPW_1 and 2 are not involved in cell differentiation and development? </li></ul></ul><ul><ul><li>MOD_ASX_betaOH_EGF is not cell differentiation (and perhaps development)? </li></ul></ul>
  9. 9. And now for something completely different: Protein association networks Genomic Neighborhood Species Co-occurrence Gene Fusions Database Imports Exp. Interaction Data Co-expression Literature co-occurrence
  10. 10. Integrating physical interaction screens <ul><li>All screens are not equal </li></ul><ul><ul><li>Complex purification vs. Y2H </li></ul></ul><ul><ul><li>Quality varies greatly </li></ul></ul><ul><li>All interactions within a screen are not equal </li></ul><ul><ul><li>Quality measure for each type </li></ul></ul><ul><ul><li>Benchmarking against KEGG </li></ul></ul><ul><li>Combination of evidence from multiple screens </li></ul><ul><li>Cross-species transfer of interaction evidence </li></ul>
  11. 11. Mining microarray expression databases Re-normalize arrays by modern method to remove biases Build expression matrix Combine similar arrays by PCA Construct predictor by Gaussian kernel density estimation Calibrate against KEGG maps Transfer associations across species
  12. 12. Co-mentioning in the scientific literature Associate abstracts with species Identify gene names in title/abstract Count (co-)occurrences of genes Test significance of associations Calibrate against KEGG maps Transfer associations across species
  13. 13. Extracting transient interactions through data integration
  14. 14. Mining for ELM mediated interactions <ul><li>ELM pattern matching against D. melanogaster SP-proteome using species and domain filters </li></ul><ul><li>Assignment of SMART domains </li></ul><ul><li>Find pairs of proteins having a SMART domain and the corresponding ligand ELM </li></ul><ul><li>Overlay with Y2H protein interaction set by Curagen </li></ul>
  15. 15. Summary: Have ELMs – want function <ul><li>There is a huge potential in using ELMs in addition to domains for function prediction </li></ul><ul><li>Conservation of protein features (such as ELMs) in orthologs underlines their importance for protein function </li></ul><ul><li>Integration of ELMs with other evidence types can be used to extract likely (transient) ELM mediated interactions </li></ul><ul><li>Work still remains to be done: </li></ul><ul><ul><li>The false positive rate is still too high for predictive purposes </li></ul></ul><ul><ul><li>Better ELM models are needed </li></ul></ul><ul><ul><li>Better filters are needed </li></ul></ul><ul><li>Wild and crazy ideas </li></ul><ul><ul><li>Overlay SMART/ELM pairs with STRING predicted associations </li></ul></ul><ul><ul><li>Functional associations from ELM/SMART vectors </li></ul></ul>
  16. 16. Acknowledgments <ul><li>You – the ELM team  </li></ul><ul><li>DisEMBL team </li></ul><ul><ul><li>Rune Linding </li></ul></ul><ul><ul><li>Francesca Diella </li></ul></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><ul><li>Toby Gibson </li></ul></ul><ul><ul><li>Rob Russell </li></ul></ul><ul><li>The STRING team </li></ul><ul><ul><li>Peer Bork </li></ul></ul><ul><ul><li>Christian von Mering </li></ul></ul><ul><ul><li>Berend Snel </li></ul></ul><ul><ul><li>Martijn Huynen </li></ul></ul><ul><ul><li>Daniel Jaeggi </li></ul></ul><ul><ul><li>Steffen Schmidt </li></ul></ul><ul><li>ArrayProspector </li></ul><ul><ul><li>Julien Lagarde </li></ul></ul><ul><li>NetView </li></ul><ul><ul><li>Sean Hooper </li></ul></ul><ul><li>The ProtFun team </li></ul><ul><ul><li>Søren Brunak </li></ul></ul><ul><ul><li>Ramneek Gupta </li></ul></ul><ul><ul><li>Can Kesmir </li></ul></ul><ul><ul><li>Kristoffer Rapacki </li></ul></ul><ul><ul><li>Hans-Henrik Stærfeldt </li></ul></ul><ul><ul><li>Henrik Nielsen </li></ul></ul><ul><ul><li>Nikolaj Blom </li></ul></ul><ul><ul><li>Claus A.F. Andersen </li></ul></ul><ul><ul><li>Anders Krogh </li></ul></ul><ul><ul><li>Steen Knudsen </li></ul></ul><ul><ul><li>Chris Workman </li></ul></ul><ul><li>The EUCLID team </li></ul><ul><ul><li>Alfonso Valencia </li></ul></ul><ul><ul><li>Damien Devos </li></ul></ul><ul><ul><li>Javier Tamames </li></ul></ul>
  17. 17. Thank you!

×