CAFA poster presented at CSHL Genome Informatics 2013
Upcoming SlideShare
Loading in...5
×
 

CAFA poster presented at CSHL Genome Informatics 2013

on

  • 593 views

 

Statistics

Views

Total Views
593
Views on SlideShare
593
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

CAFA poster presented at CSHL Genome Informatics 2013 CAFA poster presented at CSHL Genome Informatics 2013 Presentation Transcript

  • Critical Assessment of Function Annotations: Lessons Learned and the Road Ahead 1,* 2,3 4 4 5 Iddo Friedberg , Wyatt T Clark , Alexandra M Schnoes , Patricia C Babbitt , Sean D Mooney and Predrag Radivojac Introduction To understand and improve our ability to computationally annotate proteins, we are holding a series of multi-year challenges to the developers of function annotation programs. The rationale being that having these programs challenged and assessed will lead to understanding and improving predictive ability. The first critical assessment of Function Annotation (CAFA 1) was held over 2010-2011, involved 23 research groups and assessed the performance of 54 algorithms. CAFA 1 was structured as a time-challenge, where proteins which had no experimentally-validated function annotation were presented to the methods, and their function was predicted. Over the course of 10 months, some of these proteins gained experimental validation, and those were used as the final benchmark to assess program performance. Participating Methods Predictions on Human and Mouse Understanding protein function is a key component to understanding life at a molecular level. It is also important for understanding and treating human disease, since many conditions arise as a consequence of the loss or gain of protein function. 2 BPO MFO Database Bias Here we review CAFA 1, and introduce CAFA 2, which is taking place 2013-2014. There is extensive bias in experimentally validated annotations in Uniprot-GOA. The bias is contributed by high throughput experiments. Many HT experimental annotations create redundancies Case Study: hPNPase The CAFA Experiment: Generating Targets A circle represents the sum total of articles annotating each organism. Each colored arch is composed of all the proteins in a single article. A line is drawn between any two points on the circle if the proteins they represent have 100% sequence identity. A black line is drawn if they are annotated with a different ontology (for example, in one article the protein is annotated with the MFO, and in another article with BPO); a red line if they are annotated in the same ontology. Example: S. pombe is described by two articles, one with few protein (light arch on bottom) and one with many (dark arch encompassing most of circle). Many of the same proteins are annotated by both articles. New in CAFA 2 Assessing Method Performance Engaging more communities Human Phenotype Ontology Precision: pr = TP/(TP+FP) Recall: rc = TP/(TP+FN) pr +rc F1 = 2×( ) pr×rc Experimental Biologists Computer Scientists Cellular Component Ontology (a) Domain architecture of human PNPT1 gene according to the Pfam classification. For each domain, the numbers of different leaf terms (associated with any protein in Swiss-Prot database containing this domain are shown. (b) Molecular Function terms (six of which are leaves) associated with the human PNPT1 gene in Swiss-Prot as of December 2011. Colored circles represent the predicted terms for three representative methods as well as two baseline methods. The prediction threshold for each method was selected to correspond to the point in the precision-recall space that provides the maximum F-measure. J (blue), Jones-UCL; O (magenta), Team Orengo; d (navy blue), dcGO; B (green), BLAST; N (brown), Naive. Dashed lines indicate the presence of other terms between the source and destination nodes. Steering Committee Organizing Committee Data Wrangler Iddo Friedberg Michal Linial Mark Wass Sean D Mooney Predrag Radivojac Tal Ronen Oron Algorithms, Assessment methods Download poster Go to our website Computational Biologists CAFA 2 Assessor Patricia Babbitt Steven Brenner Christine Orengo Burkhard Rost Reassessing CAFA 1 methods Targets CAFA Targets & Ontologies Biocurators Anna Tramontano Author Affiliations 1. Miami University, Oxford OH 2. Indiana University, Bloomington, IN 3. Yale University, New Haven, MA 4. University of California San Francisco, CA 5. Buck Institute for Research on Aging, CA * i.friedberg@miamioh.edu References and more information CAFA: Radivojac et al (2013) Nature Methods doi:10.1038/nmeth.2340 http://BioFunctionPrediction.org Database Bias: Schnoes et al (2013) PLoS Computational Biology doi:10.1371/journal.pcbi.1003063