Using computational predictions to improve literature-based Gene Ontology annotations, Julie Park
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Using computational predictions to improve literature-based Gene Ontology annotations, Julie Park

on

  • 340 views

 

Statistics

Views

Total Views
340
Views on SlideShare
340
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Using computational predictions to improve literature-based Gene Ontology annotations, Julie Park Presentation Transcript

  • 1. Using  computa.onal  predic.ons  to  improve  literature-­‐based  Gene  Ontology  annota.ons   Julie  Park,  Ph.D.   Saccharomyces  Genome  Database    •    hAp://www.yeastgenome.org/   Department  of  Gene.cs  •  Stanford  University  School  of  Medicine  
  • 2. Attaining curation nirvana…•  Curation efficiency•  Annotation consistency•  Data accuracy
  • 3. …is not easy!Annotation errors1.  Mistakes  in  capturing                        the  annota.on    2.  Outdated  informa.on    3.  Missing  annota.ons   How  can  you  find  these  errors?  
  • 4. Flavors of GO annotations 1.  Literature-based – “Manual” Individually assigned by biocurators based on the published literature 2.  Computationally-predicted – “Computational” Automatically generated by in silico methods such as protein signatures or computational algorithms Sources of computational predictions in SGD InterPro 1 Swiss-Prot Keywords (SPKW) 2 YeastFunc 3 BioPixie 41. Camon, et al (2003) Genome Res. 13:662-722. http://www.ebi.ac.uk/GOA/Swiss-ProtKeyword2GO.html3. Tian, et al (2008) Genome Biol. 9 Suppl 1:S74. Huttenhower and Troyanskaya (2008) Bioinformatics. 24:i330-8
  • 5. Flavors of GO annotations 1.  Literature-based – “Manual” Individually assigned by biocurators based on the published literature 2.  Computationally-predicted – “Computational” Automatically generated by in silico methods such as protein signatures or computational algorithms Sources of computational predictions in SGD InterPro 1 Swiss-Prot Keywords (SPKW) 2 YeastFunc 3 BioPixie 4Is  it  possible  to  take  advantage  of  the  strengths  of  computa=onal     predic=ons  and  leverage  these  annota=ons  to  improve  manual   ones?    
  • 6. CvManGO:Computational vs. Manual GO Annotations M ✓   C ✓   ✓   M ✗   ✓   C ✓   M
  • 7. CvManGO:Computational vs. Manual GO Annotations M ✗   C Do  discrepancies  between  a  literature-­‐based  annota.on     and  a  computa.onal  predic.on  indicate  that  the     manual  annota.on  needs  to  be  updated?    
  • 8. CvManGO: Computational vs. Manual GO Annotations C OK M = C ✓   M MDiscrepancies ✗   No parent-child C M relationship C between terms
  • 9. CvManGO: Computational vs. Manual GO Annotations C M = C M M 4379  genes   Reviewed     No parent-child 336  genes   C M relationship between terms Cfrom  October  2009  gene_associa=on.sgd  file    (6353  total  genes)  
  • 10. Discrepancies can identify genes that need updating 100%" 90%" percentage of total genes reviewed" 80%" 70%" 60%" 50%" no change" 40%" updatable" 30%" 20%" 10%" 0%" Control   Genes  flagged  with   discrepant  annota.on  pairs  
  • 11. Extrapolating to the entire genome no computational prediction" available" 15%" Not flagged " for review: " flagged for review:" no discrepancies" resulting in potential with computational improvement" predictions" 53%" 16%" flagged for review: " no change needed" 16%"S.ll  requires  reviewing  4379/6353  genes—can  we  narrow  this  down  
  • 12. Factoring in the type of update flagged for review" no resulting in potential computational improvement" prediction" (add novel annotation)" available" 20%" 15%" Not flagged " for review: " no discrepancies" with computational predictions" flagged for review " 16%" resulting in potential improvement" (refinement and " flagged for review: " removal only)" no change 33%" needed" 16%"
  • 13. Factoring in the type of update flagged for review" no resulting in potential computational improvement" prediction" (add novel annotation)" available" 20%" 15%" Not flagged " for review: " no discrepancies" with computational predictions" flagged for review " 16%" resulting in potential improvement" (refinement and " flagged for review: " removal only)" no change 33%" needed" 16%"
  • 14. Attributes of flagged genes What  are  factors  that  enrich  for  genes  missing  annota.ons?  •  Type  of  discrepancy  •  GO  aspect  •  Amount  of  literature  for  a  gene  •  Source  of  the  computa.onal  predic.on  •  Number  of  computa.onal  sources  with  discrepancies  
  • 15. Attributes of flagged genes What  are  factors  that  enrich  for  genes  missing  annota.ons?  •  Type  of  discrepancy  •  GO  aspect  •  Amount  of  literature  for  a  gene  •  Source  of  the  computa.onal  predic.on  •  Number  of  computa.onal  sources  with  discrepancies  
  • 16. Analysis by Class of Discrepancies M Shallow class C Mismatch classM C
  • 17. Analysis by Class of Discrepancies % updatable M Shallow class 78.8% C Mismatch class 59.2%M C
  • 18. Types of annotation updates by class Mismatch Shallow 1842 5 16 32 1 4 138 20 24 16 15 20 14 Number of genes needing: annotation refinement annotation removal novel annotation addition
  • 19. Summary & Conclusions•  Majority of S. cerevisiae literature-based GO annotations are good•  Comparing manual vs. computational prediction can identify genes whose annotations need updating•  Additional work needs to be done to pinpoint these annotations and genes
  • 20. Summary & Conclusions•  Majority of S. cerevisiae literature-based GO annotations are good•  Comparing manual vs. computational prediction can identify genes whose annotations need updating•  Additional work needs to be done to pinpoint these annotations and genes It works but there is still work to do!
  • 21. Future plans•  Identify predictive features of genes that need updating -  Are there specific GO terms used for manual curation more likely to be updated? -  Do specific computational predictions indicate a GO term should be updated? -  Examine node distance between GO terms used for computational and literature-based annotations -  Examine contribution of annotation date and new publications -  A combination of or all of the above?•  Evaluate the accuracy of computational predictions for S. cerevisiae•  Expand to evaluate annotations made based on orthology -  Annotations from GOC PAINT project•  Develop a pipeline for curation prioritization at SGD•  Extend to other annotation projects
  • 22. GO  Consor.um  http://www.geneontology.org/
  • 23. Saccharomyces  Genome  Database  staff   @yeastgenome http://on.fb.me/ksgskb sgd-helpdesk@lists.stanford.edu   Supported  by  NIH,  Na.onal  Human  Genome  Research  Ins.tute