SlideShare a Scribd company logo
1 of 23
Download to read offline
Using	
  computa.onal	
  predic.ons	
  to	
  improve	
  
literature-­‐based	
  Gene	
  Ontology	
  annota.ons	
  




                                    Julie	
  Park,	
  Ph.D.	
  
    Saccharomyces	
  Genome	
  Database	
  	
  •	
  	
  hAp://www.yeastgenome.org/	
  
     Department	
  of	
  Gene.cs	
  •	
  Stanford	
  University	
  School	
  of	
  Medicine	
  
Attaining curation nirvana…



•  Curation efficiency

•  Annotation consistency

•  Data accuracy
…is not easy!


Annotation errors
1.	
  Mistakes	
  in	
  capturing	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  the	
  annota.on	
  
	
  
2.	
  Outdated	
  informa.on	
  
	
  
3.	
  Missing	
  annota.ons	
  




               How	
  can	
  you	
  find	
  these	
  errors?	
  
Flavors of GO annotations
                 1.  Literature-based – “Manual”
                       Individually assigned by biocurators based on the
                       published literature

                 2.  Computationally-predicted – “Computational”
                       Automatically generated by in silico methods such as
                       protein signatures or computational algorithms

                    Sources of computational predictions in SGD
                        InterPro 1
                        Swiss-Prot Keywords (SPKW) 2
                        YeastFunc 3
                        BioPixie 4


1.   Camon, et al (2003) Genome Res. 13:662-72
2.   http://www.ebi.ac.uk/GOA/Swiss-ProtKeyword2GO.html
3.   Tian, et al (2008) Genome Biol. 9 Suppl 1:S7
4.   Huttenhower and Troyanskaya (2008) Bioinformatics. 24:i330-8
Flavors of GO annotations
       1.  Literature-based – “Manual”
             Individually assigned by biocurators based on the
             published literature

       2.  Computationally-predicted – “Computational”
             Automatically generated by in silico methods such as
             protein signatures or computational algorithms

          Sources of computational predictions in SGD
             InterPro 1
             Swiss-Prot Keywords (SPKW) 2
             YeastFunc 3
             BioPixie 4


Is	
  it	
  possible	
  to	
  take	
  advantage	
  of	
  the	
  strengths	
  of	
  computa=onal	
  	
  
  predic=ons	
  and	
  leverage	
  these	
  annota=ons	
  to	
  improve	
  manual	
  
                                               ones?	
  	
  
CvManGO:
Computational vs. Manual GO Annotations


          M              ✓	
  
                                            C
                 ✓	
             ✓	
  
          M
                                    ✗	
  
                 ✓	
                        C
                            ✓	
  
          M
CvManGO:
Computational vs. Manual GO Annotations




                     M
                                         ✗	
  
                                                    C



  Do	
  discrepancies	
  between	
  a	
  literature-­‐based	
  annota.on	
  	
  
        and	
  a	
  computa.onal	
  predic.on	
  indicate	
  that	
  the	
  	
  
            manual	
  annota.on	
  needs	
  to	
  be	
  updated?	
  	
  
CvManGO:
   Computational vs. Manual GO Annotations

                                     C
   OK
                M   =   C
    ✓	
  
                                     M


                    M
Discrepancies

     ✗	
  
                                No parent-child
                    C       M   relationship      C
                                between terms
CvManGO:
         Computational vs. Manual GO Annotations

                                                                                 C
                                                M               =   C

                                                                                 M


                                                            M
             4379	
  genes	
  

              Reviewed	
  	
  
                                                                            No parent-child
              336	
  genes	
                                    C       M   relationship
                                                                            between terms
                                                                                              C
from	
  October	
  2009	
  gene_associa=on.sgd	
  file	
  	
  
(6353	
  total	
  genes)	
  
Discrepancies can identify genes that need updating
                                                   100%"


                                                   90%"




             percentage of total genes reviewed"
                                                   80%"


                                                   70%"


                                                   60%"


                                                   50%"                                                         no change"

                                                   40%"
                                                                                                                updatable"

                                                   30%"


                                                   20%"


                                                   10%"


                                                    0%"

                                                           Control	
         Genes	
  flagged	
  with	
  
                                                                         discrepant	
  annota.on	
  pairs	
  
Extrapolating to the entire genome


                                                          no
                                                     computational
                                                       prediction"
                                                       available"
                                                         15%"
                                                                 Not flagged "
                                                                 for review: "
                             flagged for review:"              no discrepancies"
                            resulting in potential           with computational
                               improvement"                      predictions"
                                    53%"                            16%"

                                                          flagged for
                                                            review: "
                                                          no change
                                                           needed"
                                                             16%"




S.ll	
  requires	
  reviewing	
  4379/6353	
  genes—can	
  we	
  narrow	
  this	
  down	
  
Factoring in the type of update


      flagged for review"           no
     resulting in potential   computational
        improvement"            prediction"
    (add novel annotation)"     available"
             20%"                 15%"
                                          Not flagged "
                                          for review: "
                                       no discrepancies"
                                      with computational
                                          predictions"
      flagged for review "                    16%"
     resulting in potential
         improvement"
       (refinement and "            flagged for
                                     review: "
         removal only)"
                                   no change
             33%"
                                    needed"
                                      16%"
Factoring in the type of update


      flagged for review"           no
     resulting in potential   computational
        improvement"            prediction"
    (add novel annotation)"     available"
             20%"                 15%"
                                          Not flagged "
                                          for review: "
                                       no discrepancies"
                                      with computational
                                          predictions"
      flagged for review "                    16%"
     resulting in potential
         improvement"
       (refinement and "            flagged for
                                     review: "
         removal only)"
                                   no change
             33%"
                                    needed"
                                      16%"
Attributes of flagged genes

 What	
  are	
  factors	
  that	
  enrich	
  for	
  genes	
  missing	
  annota.ons?	
  


•    Type	
  of	
  discrepancy	
  
•    GO	
  aspect	
  
•    Amount	
  of	
  literature	
  for	
  a	
  gene	
  
•    Source	
  of	
  the	
  computa.onal	
  predic.on	
  
•    Number	
  of	
  computa.onal	
  sources	
  with	
  discrepancies	
  
Attributes of flagged genes

 What	
  are	
  factors	
  that	
  enrich	
  for	
  genes	
  missing	
  annota.ons?	
  


•    Type	
  of	
  discrepancy	
  
•    GO	
  aspect	
  
•    Amount	
  of	
  literature	
  for	
  a	
  gene	
  
•    Source	
  of	
  the	
  computa.onal	
  predic.on	
  
•    Number	
  of	
  computa.onal	
  sources	
  with	
  discrepancies	
  
Analysis by Class of Discrepancies



    M
            Shallow class
    C




            Mismatch class
M       C
Analysis by Class of Discrepancies

                              % updatable


    M
            Shallow class       78.8%
    C




            Mismatch class      59.2%
M       C
Types of annotation updates by class

     Mismatch                                     Shallow




          18
42                                                               5 16
                     32                                           1 4
                                                  138
          20
                                                            24
     16        15                                                 20
          14




                    Number of genes needing:
                      annotation refinement
                      annotation removal
                      novel annotation addition
Summary & Conclusions

•  Majority of S. cerevisiae literature-based GO annotations are good

•  Comparing manual vs. computational prediction can identify genes
   whose annotations need updating

•  Additional work needs to be done to pinpoint these annotations
   and genes
Summary & Conclusions

•  Majority of S. cerevisiae literature-based GO annotations are good

•  Comparing manual vs. computational prediction can identify genes
   whose annotations need updating

•  Additional work needs to be done to pinpoint these annotations
   and genes




             It works but
       there is still work to do!
Future plans
•  Identify predictive features of genes that need updating
 -  Are there specific GO terms used for manual curation more likely to be updated?
 -  Do specific computational predictions indicate a GO term should be updated?
 -  Examine node distance between GO terms used for computational and
    literature-based annotations
 -  Examine contribution of annotation date and new publications
 -  A combination of or all of the above?

•  Evaluate the accuracy of computational predictions for S. cerevisiae

•  Expand to evaluate annotations made based on orthology
 -  Annotations from GOC PAINT project


•  Develop a pipeline for curation prioritization at SGD

•  Extend to other annotation projects
GO	
  Consor.um	
  




http://www.geneontology.org/
Saccharomyces	
  Genome	
  Database	
  staff	
  




    @yeastgenome                                    http://on.fb.me/ksgskb
                     sgd-helpdesk@lists.stanford.edu
                                                                                  	
  
      Supported	
  by	
  NIH,	
  Na.onal	
  Human	
  Genome	
  Research	
  Ins.tute

More Related Content

Recently uploaded

Collective Mining | Corporate Presentation | April 2024
Collective Mining | Corporate Presentation | April 2024Collective Mining | Corporate Presentation | April 2024
Collective Mining | Corporate Presentation | April 2024CollectiveMining1
 
Corporate Presentation Probe April 2024.pdf
Corporate Presentation Probe April 2024.pdfCorporate Presentation Probe April 2024.pdf
Corporate Presentation Probe April 2024.pdfProbe Gold
 
Q1 Probe Gold Quarterly Update- April 2024
Q1 Probe Gold Quarterly Update- April 2024Q1 Probe Gold Quarterly Update- April 2024
Q1 Probe Gold Quarterly Update- April 2024Probe Gold
 
Mandalay Resources 2024 April IR Presentation
Mandalay Resources 2024 April IR PresentationMandalay Resources 2024 April IR Presentation
Mandalay Resources 2024 April IR PresentationMandalayResources
 
Corporate Presentation Probe April 2024.pdf
Corporate Presentation Probe April 2024.pdfCorporate Presentation Probe April 2024.pdf
Corporate Presentation Probe April 2024.pdfProbe Gold
 
Collective Mining | Corporate Presentation - April 2024
Collective Mining | Corporate Presentation - April 2024Collective Mining | Corporate Presentation - April 2024
Collective Mining | Corporate Presentation - April 2024CollectiveMining1
 
Q1 Quarterly Update - April 16, 2024.pdf
Q1 Quarterly Update - April 16, 2024.pdfQ1 Quarterly Update - April 16, 2024.pdf
Q1 Quarterly Update - April 16, 2024.pdfProbe Gold
 
Leveraging USDA Rural Development Grants for Community Growth and Sustainabil...
Leveraging USDA Rural Development Grants for Community Growth and Sustainabil...Leveraging USDA Rural Development Grants for Community Growth and Sustainabil...
Leveraging USDA Rural Development Grants for Community Growth and Sustainabil...USDAReapgrants.com
 
Osisko Gold Royalties Ltd - Corporate Presentation, April 10, 2024
Osisko Gold Royalties Ltd - Corporate Presentation, April 10, 2024Osisko Gold Royalties Ltd - Corporate Presentation, April 10, 2024
Osisko Gold Royalties Ltd - Corporate Presentation, April 10, 2024Osisko Gold Royalties Ltd
 
slideshare_2404_presentation materials_en.pdf
slideshare_2404_presentation materials_en.pdfslideshare_2404_presentation materials_en.pdf
slideshare_2404_presentation materials_en.pdfsansanir
 
the 25 most beautiful words for a loving and lasting relationship.pdf
the 25 most beautiful words for a loving and lasting relationship.pdfthe 25 most beautiful words for a loving and lasting relationship.pdf
the 25 most beautiful words for a loving and lasting relationship.pdfFrancenel
 

Recently uploaded (12)

Collective Mining | Corporate Presentation | April 2024
Collective Mining | Corporate Presentation | April 2024Collective Mining | Corporate Presentation | April 2024
Collective Mining | Corporate Presentation | April 2024
 
Corporate Presentation Probe April 2024.pdf
Corporate Presentation Probe April 2024.pdfCorporate Presentation Probe April 2024.pdf
Corporate Presentation Probe April 2024.pdf
 
Q1 Probe Gold Quarterly Update- April 2024
Q1 Probe Gold Quarterly Update- April 2024Q1 Probe Gold Quarterly Update- April 2024
Q1 Probe Gold Quarterly Update- April 2024
 
Mandalay Resources 2024 April IR Presentation
Mandalay Resources 2024 April IR PresentationMandalay Resources 2024 April IR Presentation
Mandalay Resources 2024 April IR Presentation
 
Corporate Presentation Probe April 2024.pdf
Corporate Presentation Probe April 2024.pdfCorporate Presentation Probe April 2024.pdf
Corporate Presentation Probe April 2024.pdf
 
Collective Mining | Corporate Presentation - April 2024
Collective Mining | Corporate Presentation - April 2024Collective Mining | Corporate Presentation - April 2024
Collective Mining | Corporate Presentation - April 2024
 
Q1 Quarterly Update - April 16, 2024.pdf
Q1 Quarterly Update - April 16, 2024.pdfQ1 Quarterly Update - April 16, 2024.pdf
Q1 Quarterly Update - April 16, 2024.pdf
 
Korea District Heating Corporation 071320 Algorithm Investment Report
Korea District Heating Corporation 071320 Algorithm Investment ReportKorea District Heating Corporation 071320 Algorithm Investment Report
Korea District Heating Corporation 071320 Algorithm Investment Report
 
Leveraging USDA Rural Development Grants for Community Growth and Sustainabil...
Leveraging USDA Rural Development Grants for Community Growth and Sustainabil...Leveraging USDA Rural Development Grants for Community Growth and Sustainabil...
Leveraging USDA Rural Development Grants for Community Growth and Sustainabil...
 
Osisko Gold Royalties Ltd - Corporate Presentation, April 10, 2024
Osisko Gold Royalties Ltd - Corporate Presentation, April 10, 2024Osisko Gold Royalties Ltd - Corporate Presentation, April 10, 2024
Osisko Gold Royalties Ltd - Corporate Presentation, April 10, 2024
 
slideshare_2404_presentation materials_en.pdf
slideshare_2404_presentation materials_en.pdfslideshare_2404_presentation materials_en.pdf
slideshare_2404_presentation materials_en.pdf
 
the 25 most beautiful words for a loving and lasting relationship.pdf
the 25 most beautiful words for a loving and lasting relationship.pdfthe 25 most beautiful words for a loving and lasting relationship.pdf
the 25 most beautiful words for a loving and lasting relationship.pdf
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Using computational predictions to improve literature-based Gene Ontology annotations, Julie Park

  • 1. Using  computa.onal  predic.ons  to  improve   literature-­‐based  Gene  Ontology  annota.ons   Julie  Park,  Ph.D.   Saccharomyces  Genome  Database    •    hAp://www.yeastgenome.org/   Department  of  Gene.cs  •  Stanford  University  School  of  Medicine  
  • 2. Attaining curation nirvana… •  Curation efficiency •  Annotation consistency •  Data accuracy
  • 3. …is not easy! Annotation errors 1.  Mistakes  in  capturing                        the  annota.on     2.  Outdated  informa.on     3.  Missing  annota.ons   How  can  you  find  these  errors?  
  • 4. Flavors of GO annotations 1.  Literature-based – “Manual” Individually assigned by biocurators based on the published literature 2.  Computationally-predicted – “Computational” Automatically generated by in silico methods such as protein signatures or computational algorithms Sources of computational predictions in SGD InterPro 1 Swiss-Prot Keywords (SPKW) 2 YeastFunc 3 BioPixie 4 1. Camon, et al (2003) Genome Res. 13:662-72 2. http://www.ebi.ac.uk/GOA/Swiss-ProtKeyword2GO.html 3. Tian, et al (2008) Genome Biol. 9 Suppl 1:S7 4. Huttenhower and Troyanskaya (2008) Bioinformatics. 24:i330-8
  • 5. Flavors of GO annotations 1.  Literature-based – “Manual” Individually assigned by biocurators based on the published literature 2.  Computationally-predicted – “Computational” Automatically generated by in silico methods such as protein signatures or computational algorithms Sources of computational predictions in SGD InterPro 1 Swiss-Prot Keywords (SPKW) 2 YeastFunc 3 BioPixie 4 Is  it  possible  to  take  advantage  of  the  strengths  of  computa=onal     predic=ons  and  leverage  these  annota=ons  to  improve  manual   ones?    
  • 6. CvManGO: Computational vs. Manual GO Annotations M ✓   C ✓   ✓   M ✗   ✓   C ✓   M
  • 7. CvManGO: Computational vs. Manual GO Annotations M ✗   C Do  discrepancies  between  a  literature-­‐based  annota.on     and  a  computa.onal  predic.on  indicate  that  the     manual  annota.on  needs  to  be  updated?    
  • 8. CvManGO: Computational vs. Manual GO Annotations C OK M = C ✓   M M Discrepancies ✗   No parent-child C M relationship C between terms
  • 9. CvManGO: Computational vs. Manual GO Annotations C M = C M M 4379  genes   Reviewed     No parent-child 336  genes   C M relationship between terms C from  October  2009  gene_associa=on.sgd  file     (6353  total  genes)  
  • 10. Discrepancies can identify genes that need updating 100%" 90%" percentage of total genes reviewed" 80%" 70%" 60%" 50%" no change" 40%" updatable" 30%" 20%" 10%" 0%" Control   Genes  flagged  with   discrepant  annota.on  pairs  
  • 11. Extrapolating to the entire genome no computational prediction" available" 15%" Not flagged " for review: " flagged for review:" no discrepancies" resulting in potential with computational improvement" predictions" 53%" 16%" flagged for review: " no change needed" 16%" S.ll  requires  reviewing  4379/6353  genes—can  we  narrow  this  down  
  • 12. Factoring in the type of update flagged for review" no resulting in potential computational improvement" prediction" (add novel annotation)" available" 20%" 15%" Not flagged " for review: " no discrepancies" with computational predictions" flagged for review " 16%" resulting in potential improvement" (refinement and " flagged for review: " removal only)" no change 33%" needed" 16%"
  • 13. Factoring in the type of update flagged for review" no resulting in potential computational improvement" prediction" (add novel annotation)" available" 20%" 15%" Not flagged " for review: " no discrepancies" with computational predictions" flagged for review " 16%" resulting in potential improvement" (refinement and " flagged for review: " removal only)" no change 33%" needed" 16%"
  • 14. Attributes of flagged genes What  are  factors  that  enrich  for  genes  missing  annota.ons?   •  Type  of  discrepancy   •  GO  aspect   •  Amount  of  literature  for  a  gene   •  Source  of  the  computa.onal  predic.on   •  Number  of  computa.onal  sources  with  discrepancies  
  • 15. Attributes of flagged genes What  are  factors  that  enrich  for  genes  missing  annota.ons?   •  Type  of  discrepancy   •  GO  aspect   •  Amount  of  literature  for  a  gene   •  Source  of  the  computa.onal  predic.on   •  Number  of  computa.onal  sources  with  discrepancies  
  • 16. Analysis by Class of Discrepancies M Shallow class C Mismatch class M C
  • 17. Analysis by Class of Discrepancies % updatable M Shallow class 78.8% C Mismatch class 59.2% M C
  • 18. Types of annotation updates by class Mismatch Shallow 18 42 5 16 32 1 4 138 20 24 16 15 20 14 Number of genes needing: annotation refinement annotation removal novel annotation addition
  • 19. Summary & Conclusions •  Majority of S. cerevisiae literature-based GO annotations are good •  Comparing manual vs. computational prediction can identify genes whose annotations need updating •  Additional work needs to be done to pinpoint these annotations and genes
  • 20. Summary & Conclusions •  Majority of S. cerevisiae literature-based GO annotations are good •  Comparing manual vs. computational prediction can identify genes whose annotations need updating •  Additional work needs to be done to pinpoint these annotations and genes It works but there is still work to do!
  • 21. Future plans •  Identify predictive features of genes that need updating -  Are there specific GO terms used for manual curation more likely to be updated? -  Do specific computational predictions indicate a GO term should be updated? -  Examine node distance between GO terms used for computational and literature-based annotations -  Examine contribution of annotation date and new publications -  A combination of or all of the above? •  Evaluate the accuracy of computational predictions for S. cerevisiae •  Expand to evaluate annotations made based on orthology -  Annotations from GOC PAINT project •  Develop a pipeline for curation prioritization at SGD •  Extend to other annotation projects
  • 23. Saccharomyces  Genome  Database  staff   @yeastgenome http://on.fb.me/ksgskb sgd-helpdesk@lists.stanford.edu   Supported  by  NIH,  Na.onal  Human  Genome  Research  Ins.tute