Critical Assessment of Function Annotation, 2005

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Critical Assessment of Function Annotation, 2005 - Presentation Transcript

    1. A “Fair and Balanced” Assessment of Protein Function Prediction Servers
        • Iddo Friedberg, Martin Jambon, Andrei Osterman and Adam Godzik
    2. Outline
      • What is function?
      • Describing function
      • Describing function similarity
      • Selecting Targets
      • Assessing servers
      • Thoughts for the future
    3. What is Function?
      • Biochemical
      • Pathway/location
      • Phenotypic
      10 -6 m 1m 10 -9 m Mutation: Histidinemia (Mental retardation and speech defect)‏ Histidine amino lyase (HAL, Histidase)‏ L-histidine Urocanate +NH 3
    4. Outline
      • What is function?
      • Describing function
      • Describing function similarity
      • Selecting Targets
      • Assessing servers
      • Thoughts for the future
    5. Describing Function: From English to Keywords “ HAL—which is the first enzyme in the degradation pathway of L-histidine—catalyzes the non-oxidative deamination of its substrate to trans-urocanic acid”. László Poppe, (2001) COCB “ HAL— which is the first enzyme in the degradation pathway of L-histidine — catalyzes the non-oxidative deamination of its substrate to trans -urocanic acid ”.
    6. Describing Function: From English to Keywords “ HAL—which is the first enzyme in the degradation pathway of L-histidine—catalyzes the non-oxidative deamination of its substrate to trans-urocanic acid”. László Poppe, (2001) COCB “ HAL— which is the first enzyme in the degradation pathway of L-histidine — catalyzes the non-oxidative deamination of its substrate to trans -urocanic acid ”. “ HAL — which is the first enzyme in the degradation pathway of L-histidine — catalyzes the non-oxidative deamination of its substrate to trans -urocanic acid ”. “ HAL — which is the first enzyme in the degradation pathway of L-histidine — catalyzes the non-oxidative deamination of its substrate to trans -urocanic acid ”. “ HAL — which is the first enzyme in the degradation pathway of L-histidine — catalyzes the non-oxidative deamination of its substrate to trans -urocanic acid ”.
    7. Outline
      • What is function?
      • Describing function
      • Describing function similarity
      • Selecting Targets
      • Assessing servers
      • Thoughts for the future
    8. Keyword Similarity Measurements
      • Represent each document as a set of words
      • Use set theory to normalize a common word count
      • Problem: does not take into account the corpus word count
      • Represent each document as a vector of (weighted by frequency) words
      • Document similarity based on the angle between the vectors
      • Problem: dimensions not really orthogonal (co-dependence of words)‏
    9. Keywords and Semantics “ HAL —which is the first enzyme in the degradation pathway of L-histidine — catalyzes the non-oxidative deamination of its substrate to trans-urocanic acid ”. László Poppe, (2001) COCB “ Histidase catalyzes the elimination of the alpha-amino group of histidine using a 4-methylidene-imidazole-5-one (MIO), which is formed autocatalytically from the internal peptide segment 142Ala-Ser-Gly.” Baedecker & Schultz, (2002) Eur J Biochem
      • Keyword based methods are semantically blind
      • What do we do if there are no shared keywords?
    10. Ontology: beyond keywords
      • Ontologies serve to establish the semantic function of words
      • Ontologies are 'specifications of a relational vocabulary': sets of defined terms like the sort that you would find in a dictionary, but the terms are networked. (from the GO site)‏
    11. EC1 oxidoreductases EC2 transferases EC3 hydrolases EC4 lyases EC5 isomaerase EC6 ligases EC4.1 carbon-carbon EC4.2 carbon-oxygen EC4.3 carbon-nitrogen EC4.5 phosphorus-oxygen EC4.99 others EC4.4 carbon-sulfur EC 4.3.1 ammonia lyases EC 4.3.1.1 aspartate ammonia lyase EC 4.3.1.2 met-aspartate ammonia lyase EC 4.3.1.3 histidine ammonia lyase EC 4.3.3 amine lyases Enzyme Commission Classification Enzyme
    12. EC4 lyases EC4.3 carbon-nitrogen EC 4.3.1 ammonia lyases EC 4.3.1.3 histidine ammonia lyase Enzyme Commission Classification
      • E.C. Provides a semantically accurate description by
      • 1) Using a controlled vocabulary
      • 2) Going from the general to the specific
      • 3) Defining the scope of interest
    13. Gene Ontology: Function Beyond Enzymes
      • GO describes function beyond the enzymatic
      • Three functional aspects are described:
        • Molecular function
        • Biological process
        • Cellular location
      • Terms are related by “is-a” and “part-of” relationships
    14. GO: molecular function of HAL Histidine ammonia-lyase activity (GO:003456)‏ ammonia-lyase activity C-N lyase activity lyase activity Molecular Function
    15. GO: biological process of HAL Histidine biosynthesis Histidine family biosynthesis Histidine metabolism Physiological process Cellular process Biological process
    16. Outline
      • What is function?
      • Describing function
      • Describing function similarity
        • Keywords
        • Ontology
      • Selecting targets
      • Assessing servers
      • Thoughts for the future
    17. The Gene Ontology
      • GO is a hierarchy of terms . A directed acyclic graph.
      • A node contains a single term which is one of the following aspects :
        • molecular function
        • cellular component
        • cellular process
      • Problem: how do we create a distance/similarity measure?
      • Solution 1: measure the shortest path distance between terms
      His-NH 3 lyase Serine-NH 3 lyase NH 3 -lyase C-S lyase lyase activity hydrolase Catalytic activity d(His- NH 3 lyase,C-S lyase) = 3
          • (Lord et. al Bioinformatics 2003)‏
    18. The Problem with Path Distance Some terms are more informative than others! General terms are less informative An absurd situation: Catalytic activity d(His-NH 3 lyase, C-S lyase) = 3 d(His-NH 3 lyase,catalytic activity) = 3 Need to provide a measure based upon term information content, not path distance
          • (Lord et. al Bioinformatics 2003)‏
      His-NH 3 lyase Serine-NH 3 lyase NH 3 -lyase C-S lyase lyase activity hydrolase
    19. Information Content Based Distance
      • A concept occurs if each term, or its children, occurs.
      • The probability for each concept, p ( c ) increases as we move towards the root
      • Let the set shared parents of two terms be S ( c 1, c 2)‏
      • The probability of the minimal subsumer is:
      • p(c1, c2) = min({p(c)})‏
      • c ∈ S ( c 1, c 2)‏
      • sim(c1,c2) = -log 2 ( p ms ( c 1, c 2)) gosim (His-NH3 lyase,C-S lyase) = -log 2 (p(Lyase activity)) = 1.57
      His-NH3 lyase p=0.000433 Serine-NH3 lyase p=0.0977 NH3-lyase p=0.124 C-S lyase p=0.0281 Lyase activity p=0.15 hydrolase p=0.102 Catalytic activity p=0.5 Molecular Function p=1
          • (Lord et. al Bioinformatics 2003)‏
      gosim (His-NH3 lyase, cat activity) = -log 2 (0.5)=0.7
    20. Outline
      • What is function?
      • Describing function
      • Describing function similarity
      • Selecting targets
      • Assessing servers
      • Thoughts for the future
    21. Selecting Targets
      • Ideally targets should be:
        • Without obvious homologs (sequence or structure)‏
        • Function experimentally determined
        • Not yet published
      Thanks to the target contributers: Adam Godzik TBI, JCSG Subramanian Sri Krishna , JCSG Andrei Osterman, TBI Michal Linial, UW / HUJI
    22. Selecting Targets Cont'd
      • Thermotoga maritima : a hyperthermophilic deep ocean eubacteria.
      • Genome wide fold coverage organism of choice for JCSG 1878 protein coding genes
      • Very few, “feeler” assessment targets were picked.
    23. Outline
      • What is function?
      • Describing function
      • Describing function similarity
      • Selecting targets
      • Assessing servers (target by target)‏
      • Thoughts for the future
    24. Assessed Servers
    25. Server Scorecard (empty)‏ Spearmint ___ RuleBase ___ AnnoLite ___ PFP ___ PhydBac ___ ProKnow ___ Proteome Analyst __ GOPET ___
    26. T7: an aspartate oxidase or dehydrogenase?
      • Novel enzyme, aspartate dehydrogenase, non-orthologous replacement for aspartate oxidase. TM1643 from T. maritima
      • Function experimentally verified: Yang Z et al., J Biol Chem. 2003 278(10):8804-8
    27. in cluster FIG ID Function TM ID #1 ASPDH: Aspartate dehydrogenase [same functional role as] (EC 1.4.3.16 ) TM1643* #2 QSYN: Quinolinate synthetase (EC 4.1.99.-) TM1644 #3 QAPRT: Quinolinate phosphoribosyltransferase [decarboxylating] (EC 2.4.2.19 ) TM1645* *Structures solved Biological Process: NAD/NADP Biosynthesis Summary: Genomic cluster conserved between T,maritima and several methanogenic archaea contains a novel gene* for the first step of NAD biosynthesis I II ASPOX ASPDH NAD + NADH QSYN PRPP IV FAD + FADH III QAPRT PP i Experimentally confirmed by: Z.Yang et al. (Toronto)‏ “ Aspartate Dehydrogenase, a Novel Enzyme Identified from Structural and Functional Studies of TM1643”, J. Biol. Chem., Vol. 278,, 2003 Connecting intermediates I L-Aspartate II Iminoaspartate III Dihydroxyacetone-P IV Quinolinic acid TM1643
    28. T7: GO Tree and Probabilities Oxidoreductase CH-NH 2 bonds Molecular Function Catalytic Oxidoreductase Oxidoreductase CH-NH 2 bonds NAD/NADP acceptor Oxidoreductase CH-NH 2 bonds Oxygen acceptor Aspartate Dehydrogenase Aspartate Oxidoreductase Spearmint : homoserine dehydrogenase A nnoLite : L-lactate dehydrogenase ProKnow: oxidoreductase activity p=1; gosim=0 p=0.26; gosim=192. p=0.03; gosim=484. p=8.65x10 -4 ; gosim=1017 p=4.74x10 -4 ; gosim=1104 PhydBac : nicotinate nuc. dephosphorylase PFP :3-5 nucleotide phosphodiesterase
    29. T7: summary
      • A “new” enzyme: aspartate dehydrogenase
      • Non orthologous replacement of aspartate oxidase
      • Experimentally characterized two years ago
      • Not yet in annotation databases: do we need a standard deposition system like GenBank (Sequence) PDB (structure) MIAME (microarray)?
      • Some servers' predictions are better: is GO the ideal similarity scale?
    30. T1: the little Thiamine synthesis enzyme who couldn't (and neither could we)‏
    31. T1: the Thiamine synthesis hypothesis Beck & Downs, JB 1998
    32. T1 is not involved directly in the Thiamine pathway
      • ApbE is present in bacteria with and without the thiamine biosynthesis pathway
      • No correlation with genomic presence of thiamine transporters (ThiBPQ or YuAJ)‏
      • Not present in the thi loci
      • Lack of the ApbC or ApbE protein results in a defect in Fe-S cluster metabolism (Downs, JB 2003)‏
    33. T1 Beck & Downs, JB 1998 ApbE
    34. T1: a non oligomerizing T-fold T-fold: normally forms homo-oligomeric barrels Uricase T1 Active site Oligomerization region Catalytic site Cat. site Oligomerization region
    35. T1: predictions
      • Cellular Process (Thiamine biosynthesis) found by: Spearmint, Rulebase, PFP, ProKnow
      • Molecular Function: PFP, ProKnow: oxidoreductase,
      • Molecular function PFP: transferring glycosyl groups
      • Cellular process aspect: mis-annotation due to outdated annotations in the reference databases (Pfam, KEGG)‏
      • We know the structure, active site, fold type, possible pathway
      Surprise! We do not know the real function (yet)‏
    36. T1 - conclusions
      • It is possible to have extensive knowledge of a protein, inc. structure, pathway yet know little about its actual biochemical function
    37. T2
      • TM1622. ORFan.
      • MF: GTPase binding.
        • Evidence: structural similarity to GTPase binders: Mog1, PsbP
        • Genomic co-location with the LepA (elongation factor Tu), a GTPase.
    38. T2 aligned to Ran-GTPASE MOVIE
    39. T2 Predictions binding Protein binding Enzyme binding GTPase binding Small GTPase binding p=0.45; gosim=113 p=0.21 gosim=224 p=7.4x10 -3 gosim=707 gosim=1024 gosim=1122
      • AnnoLite:
      • actin binding
      • PIP(4,5) binding
      PFP : Rab interactor
    40. T4: Pantothenate Kinase
      • TM0883
      • Pantothenate kinase activity GO:0004594 E.C. 2.7.1.33
      • Evidence: experimental (Brand & Strauss, JBC May 2005)‏
      • New protein:CoaX
    41. T4: Pantothenate Kinase, CoA synthesis Boxes: enzymes Circles: substrates/products Arrows: “preferred” reaction directionality Color: organism in which enzyme exists UNIVERSAL PATHWAY Fatty Acid metabolism Central Carbon metabolism CoA B5 PANK VI ATP ADP PPCS VIII CTP CMP, PP i VII Cysteine metabolism PPCDC IX CO 2 PPAT X PP i ATP DPCK ADP ATP PANK2 PANK3 PPAT2 present in H.sapiens present in both present in E.coli absent in both
    42. Brand, L. A. et al. J. Biol. Chem. 2005;280:20185-20188 Cluster analysis of the predicted coaX gene in selected organisms
    43. T4: server predictions
      • ProKnow : kinase activity; ATP binding
      • Many servers erroneously provided “Bortadella pertussis Bvg accessory factor family”: old Pfam annotation
      • The exact function can be inferred (almost) from genomic context:
        • T4 is clustered with other proteins in the CoA synthesis pathway in many organisms AND
        • There is no other recognizable pantothenate kinase in T. maritima 's genome
    44. T5: Growth Arrest; GDNF Receptor SwissProt: GAS1_HUMAN
      • Contributed by Michal Linial
      • Cellular process:
        • induces caspase dependent apoptosis.
        • Prevents G 0 -> S cell-cycle transition
      • Location: membrane; rafts
      • Biochemistry: binds glial derived neurotrophic factors (GDNF); GPI binding (Personal communication M. Linial)‏
      • No structure (Robetta model)‏
    45. T5: Growth Arrest; GDNF Receptor Signal transducer Receptor activity X-membrane receptor activity hematopoein/interferon-class cytokine receptor activity GDNF receptor activity p=0.134; gosim=289 p=0.072 gosim=379 p=0.052 gosim=426 p=2.32x10 -3 ; gosim=875 p=8.4x10 -5 ; gosim=1353
      • PFP:
      • Protein binding
      • Receptor activity
    46. Server Scorecard (empty)‏ Spearmint ___ RuleBase ___ AnnoLite ___ PFP ___ PhydBac ___ ProKnow ___ Proteome Analyst __ GOPET ___
    47. Server Scorecard (Full)‏ Spearmint ___ RuleBase ___ AnnoLite ___ PFP ___ PhydBac ___ ProKnow ___ Proteome Analyst __ GOPET ___
    48. Thanks
      • Adam Godzik
      • Andrei Osterman
      • Michal Linial
      • Martin Jambon
      • Subramanian Sri Krishna
    49. Points for Discussion
      • Future target selection: a call for 2006
      • Assessment strategies
      • Annotation standards
      • Should servers standardize their output?
      • Towards meta-servers
      • PDB 500 unknowns: a collaborative website
    50. Points for Discussion: Target selection: a call for 2006
      • Experimentally verified
      • Not yet published?
      • Not trivially discernible
      • Promiscuous? (Yes / No)‏
      • Moonlighters? (Yes / No)‏
      • Categorize by functional aspects?
      • Categorize by input type (sequence / structure)?
      • Estimation of prediction difficulty?
    51. Points for Discussion: Assessment strategies
      • Ontology based / other?
      • Additional and new distance measures?
      • Different distance measures for different categories?
    52. The End

    + IddoIddo, 2 months ago

    custom

    134 views, 0 favs, 0 embeds more stats

    A talk I gave at the Automated Function Prediction more

    More info about this document

    CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

    Go to text version

    • Total Views 134
      • 134 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 0
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories