Using	  computa.onal	  predic.ons	  to	  improve	  literature-­‐based	  Gene	  Ontology	  annota.ons	                     ...
Attaining curation nirvana…•  Curation efficiency•  Annotation consistency•  Data accuracy
…is not easy!Annotation errors1.	  Mistakes	  in	  capturing	  	  	  	  	  	  	  	  	  	  	  	  the	  annota.on	  	  2.	  ...
Flavors of GO annotations                 1.  Literature-based – “Manual”                       Individually assigned by b...
Flavors of GO annotations       1.  Literature-based – “Manual”             Individually assigned by biocurators based on ...
CvManGO:Computational vs. Manual GO Annotations          M              ✓	                                              C ...
CvManGO:Computational vs. Manual GO Annotations                     M                                         ✗	          ...
CvManGO:   Computational vs. Manual GO Annotations                                     C   OK                M   =   C    ...
CvManGO:         Computational vs. Manual GO Annotations                                                                  ...
Discrepancies can identify genes that need updating                                                   100%"               ...
Extrapolating to the entire genome                                                          no                            ...
Factoring in the type of update      flagged for review"           no     resulting in potential   computational        imp...
Factoring in the type of update      flagged for review"           no     resulting in potential   computational        imp...
Attributes of flagged genes What	  are	  factors	  that	  enrich	  for	  genes	  missing	  annota.ons?	  •    Type	  of	  ...
Attributes of flagged genes What	  are	  factors	  that	  enrich	  for	  genes	  missing	  annota.ons?	  •    Type	  of	  ...
Analysis by Class of Discrepancies    M            Shallow class    C            Mismatch classM       C
Analysis by Class of Discrepancies                              % updatable    M            Shallow class       78.8%    C...
Types of annotation updates by class     Mismatch                                     Shallow          1842               ...
Summary & Conclusions•  Majority of S. cerevisiae literature-based GO annotations are good•  Comparing manual vs. computat...
Summary & Conclusions•  Majority of S. cerevisiae literature-based GO annotations are good•  Comparing manual vs. computat...
Future plans•  Identify predictive features of genes that need updating -  Are there specific GO terms used for manual cur...
GO	  Consor.um	  http://www.geneontology.org/
Saccharomyces	  Genome	  Database	  staff	      @yeastgenome                                    http://on.fb.me/ksgskb     ...
Upcoming SlideShare
Loading in …5
×

Using computational predictions to improve literature-based Gene Ontology annotations, Julie Park

252
-1

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
252
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Using computational predictions to improve literature-based Gene Ontology annotations, Julie Park

  1. 1. Using  computa.onal  predic.ons  to  improve  literature-­‐based  Gene  Ontology  annota.ons   Julie  Park,  Ph.D.   Saccharomyces  Genome  Database    •    hAp://www.yeastgenome.org/   Department  of  Gene.cs  •  Stanford  University  School  of  Medicine  
  2. 2. Attaining curation nirvana…•  Curation efficiency•  Annotation consistency•  Data accuracy
  3. 3. …is not easy!Annotation errors1.  Mistakes  in  capturing                        the  annota.on    2.  Outdated  informa.on    3.  Missing  annota.ons   How  can  you  find  these  errors?  
  4. 4. Flavors of GO annotations 1.  Literature-based – “Manual” Individually assigned by biocurators based on the published literature 2.  Computationally-predicted – “Computational” Automatically generated by in silico methods such as protein signatures or computational algorithms Sources of computational predictions in SGD InterPro 1 Swiss-Prot Keywords (SPKW) 2 YeastFunc 3 BioPixie 41. Camon, et al (2003) Genome Res. 13:662-722. http://www.ebi.ac.uk/GOA/Swiss-ProtKeyword2GO.html3. Tian, et al (2008) Genome Biol. 9 Suppl 1:S74. Huttenhower and Troyanskaya (2008) Bioinformatics. 24:i330-8
  5. 5. Flavors of GO annotations 1.  Literature-based – “Manual” Individually assigned by biocurators based on the published literature 2.  Computationally-predicted – “Computational” Automatically generated by in silico methods such as protein signatures or computational algorithms Sources of computational predictions in SGD InterPro 1 Swiss-Prot Keywords (SPKW) 2 YeastFunc 3 BioPixie 4Is  it  possible  to  take  advantage  of  the  strengths  of  computa=onal     predic=ons  and  leverage  these  annota=ons  to  improve  manual   ones?    
  6. 6. CvManGO:Computational vs. Manual GO Annotations M ✓   C ✓   ✓   M ✗   ✓   C ✓   M
  7. 7. CvManGO:Computational vs. Manual GO Annotations M ✗   C Do  discrepancies  between  a  literature-­‐based  annota.on     and  a  computa.onal  predic.on  indicate  that  the     manual  annota.on  needs  to  be  updated?    
  8. 8. CvManGO: Computational vs. Manual GO Annotations C OK M = C ✓   M MDiscrepancies ✗   No parent-child C M relationship C between terms
  9. 9. CvManGO: Computational vs. Manual GO Annotations C M = C M M 4379  genes   Reviewed     No parent-child 336  genes   C M relationship between terms Cfrom  October  2009  gene_associa=on.sgd  file    (6353  total  genes)  
  10. 10. Discrepancies can identify genes that need updating 100%" 90%" percentage of total genes reviewed" 80%" 70%" 60%" 50%" no change" 40%" updatable" 30%" 20%" 10%" 0%" Control   Genes  flagged  with   discrepant  annota.on  pairs  
  11. 11. Extrapolating to the entire genome no computational prediction" available" 15%" Not flagged " for review: " flagged for review:" no discrepancies" resulting in potential with computational improvement" predictions" 53%" 16%" flagged for review: " no change needed" 16%"S.ll  requires  reviewing  4379/6353  genes—can  we  narrow  this  down  
  12. 12. Factoring in the type of update flagged for review" no resulting in potential computational improvement" prediction" (add novel annotation)" available" 20%" 15%" Not flagged " for review: " no discrepancies" with computational predictions" flagged for review " 16%" resulting in potential improvement" (refinement and " flagged for review: " removal only)" no change 33%" needed" 16%"
  13. 13. Factoring in the type of update flagged for review" no resulting in potential computational improvement" prediction" (add novel annotation)" available" 20%" 15%" Not flagged " for review: " no discrepancies" with computational predictions" flagged for review " 16%" resulting in potential improvement" (refinement and " flagged for review: " removal only)" no change 33%" needed" 16%"
  14. 14. Attributes of flagged genes What  are  factors  that  enrich  for  genes  missing  annota.ons?  •  Type  of  discrepancy  •  GO  aspect  •  Amount  of  literature  for  a  gene  •  Source  of  the  computa.onal  predic.on  •  Number  of  computa.onal  sources  with  discrepancies  
  15. 15. Attributes of flagged genes What  are  factors  that  enrich  for  genes  missing  annota.ons?  •  Type  of  discrepancy  •  GO  aspect  •  Amount  of  literature  for  a  gene  •  Source  of  the  computa.onal  predic.on  •  Number  of  computa.onal  sources  with  discrepancies  
  16. 16. Analysis by Class of Discrepancies M Shallow class C Mismatch classM C
  17. 17. Analysis by Class of Discrepancies % updatable M Shallow class 78.8% C Mismatch class 59.2%M C
  18. 18. Types of annotation updates by class Mismatch Shallow 1842 5 16 32 1 4 138 20 24 16 15 20 14 Number of genes needing: annotation refinement annotation removal novel annotation addition
  19. 19. Summary & Conclusions•  Majority of S. cerevisiae literature-based GO annotations are good•  Comparing manual vs. computational prediction can identify genes whose annotations need updating•  Additional work needs to be done to pinpoint these annotations and genes
  20. 20. Summary & Conclusions•  Majority of S. cerevisiae literature-based GO annotations are good•  Comparing manual vs. computational prediction can identify genes whose annotations need updating•  Additional work needs to be done to pinpoint these annotations and genes It works but there is still work to do!
  21. 21. Future plans•  Identify predictive features of genes that need updating -  Are there specific GO terms used for manual curation more likely to be updated? -  Do specific computational predictions indicate a GO term should be updated? -  Examine node distance between GO terms used for computational and literature-based annotations -  Examine contribution of annotation date and new publications -  A combination of or all of the above?•  Evaluate the accuracy of computational predictions for S. cerevisiae•  Expand to evaluate annotations made based on orthology -  Annotations from GOC PAINT project•  Develop a pipeline for curation prioritization at SGD•  Extend to other annotation projects
  22. 22. GO  Consor.um  http://www.geneontology.org/
  23. 23. Saccharomyces  Genome  Database  staff   @yeastgenome http://on.fb.me/ksgskb sgd-helpdesk@lists.stanford.edu   Supported  by  NIH,  Na.onal  Human  Genome  Research  Ins.tute
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×