HVP Critical Assessment of Genome Interpretation


Note: CAGI occurred in Dec 2010, after I left Berkeley. Susanna Repo made the event happen and it would not have occurred without her.

  1. 1. ca·gey ˈkā-jē adjective 1: hesitant about committing oneself; 2a: wary of being trapped or deceived; 2b: marked by clevernessCAGI (ˈkā-jē)Critical Assessment of Genome InterpretationA community experiment to evaluate phenotype predictionReece Hart (with Steven Brenner and John Moult)QB3 / Center for Computational BiologyUC Berkeleyreece@berkeley.eduHuman Variome Project MeetingParis 2010-05-12
  2. 2. The Significance of “Variants of Uncertain Significance”“VUS – Variant of uncertain significance. A variationin a genetic sequence whose association withdisease risk is unknown. Also called variant ofuncertain significance, variant of unknownsignificance, and unclassified variant.”http://www.cancer.gov/cancertopics/genetics-terms-alphalist 2
  3. 3. The long tail of rare diseases.“A rare disease typically affects a patientpopulation estimated at fewer than 200,000 inthe U.S. There are more than 6,000 rarediseases known today and they affect anestimated 25 million persons in the U.S.”NIH Office of Rare Diseases Researchhttp://rarediseases.info.nih.gov/ 3
  4. 4. Interpretation of Unclassified Variants a sampling of responses from genetic counselors➢ Routinely used ➢ Selectively used ● dbSNP ● PharmGKB ● OMIM ● LSDBs ● GeneReviews ● Domain prediction ● PolyPhen ● Structure impact ● SIFT analysis ● PubMed ● Homology ● Mailing lists 4
  5. 5. Genome Variant Impact Prediction Tools an incomplete listProgram URLAlign-GVGD http://agvgd.iarc.fr/AutoMute http://proteins.gmu.edu/automute/CUPSAT http://cupsat.tu-bs.de/Dmutant http://sparks.informatics.iupui.edu/hzhou/mutation.htmlnsSNPAnalyzer http://snpanalyzer.uthsc.edu/PantherPSEC http://www.pantherdb.org/tools/csnpScoreForm.jspPhD-SNP http://gpcr.biocomp.unibo.it/~emidio/PhD-SNP/PhD-SNP.htmPmut http://mmb2.pcb.ub.es:8080/PMut/PolyPhen http://coot.embl.de/PolyPhen/SIFT http://sift.jcvi.org/SNAP http://cubic.bioc.columbia.edu/services/snap/SNP Function Pred. http://www.ensembl.org/ [N.B. login required]SNPinfo / FuncPred http://snpinfo.niehs.nih.gov/snpfunc.htmSNPs3D http://snps3d.org/UMD-predictor http://www.umd.be/ 5
  6. 6. Current methods are the tip of the iceberg. m Cprotein non-protein repeats indels epigeneticstranscripts transcripts ~99% ~1% 6
  7. 7. Objectively Assessing Computational Predictions ➢ CASP – Structure prediction ➢ CAPRI – Protein-ligand docking ➢ EGASP – Encode Gene Annotation ➢ RGASP – RNA-Seq mapping ➢ DREAM – network model assessment Data Acquisition Publication The Prediction Window ~1-12 months when unpublished high-quality data are available 7
  8. 8. CAGI – Critical Assessment of Genome InterpretationA community assessment of the state-of-the-art in phenotype prediction.➢ Follow the successful critical assessment framework: ● Solicit pre-publication genotype- phenotype associations ● Provide genomic data to predictors and collect their predictions ● Assess predictions against revealed annotations, mechanisms, and phenotypes 8
  9. 9. Sample Prediction Categories Molecular Cellular Organismal A A A T T T MTHFR mutants – Breast Cancer – PGP100 – Yeast growth Segregation of rare Unpublished rates with various variants among phenotypes from MTHFR mutations 2500 cases and PGP100 project. and [folate]. controls. (Jasper Rine) (Sean Tavtigian) (George Church)Please contact us if you have pre-publication genotype-phenotypeassociation data. 9
  10. 10. Census of Molecular Mechanismspossible mechanisms of variant impact for WTCCC SNVs Wellcome Trust Case Control Consortium Nature. 2007;447(7145):661-78. 10
  11. 11. Contributors, Predictors, Assessors an incomplete list of participantsGad Getz Sean Tavtigian Rachel Karchin Jasper RinePauline Ng Marc Greenblatt Mauno Vihinen George Church 11
  12. 12. Sample CAGI Timeline Dates are for illustration – exact dates have not been set.05-2405-3106-0706-1406-2106-2808-2308-3009-0609-1309-2009-2711-2211-2912-0612-1312-2012-2705-0305-1005-1707-0507-1207-1907-2608-0208-0908-1610-0410-1110-1810-2511-0111-0811-1501-0301-1001-1701-2401-31 Data Gathering Prediction Season AssessmentKey Dates ▲ finalize data sources ▲ workshop ▲ release prospectus / rules ▲ open participant registration 12
  13. 13. CAGI Summary➢ CAGI will: ● objectively assess phenotype prediction methods ● inform future research directions ● introduce researchers in diverse fields➢ CAGI is being planned for the end of 2010 or early 2011.➢ Now seeking data contributors, assessors, and predictors.➢ Feedback is sought! reece@berkeley.edu➢ See http://genomecommons.org/cagi for more information. 13
  15. 15. The Genome Commons:A Flagship Project Within QB3 10 km 15
  16. 16. Program in Translational Genomics Rasmus Nielseno Michael I. Jordan Ian Holmes Kimmen Sjölander Yun Song Monty Slatkin Terry SpeedSteven HartReece Brenner Sandrine Dudoit Robert Nussbaum Mark van der LaanPlant & Mol. BiologyChief Scientist Biostatistics Chief, Medical Genetics Richard KarpUC BerkeleyUC Berkeley UC Berkeley UCSF Bernd Sturmfels Steven Evans Elizabeth Purdom Haiyan Huang Peter Bickel Susan Marqusee Michael Eisen Lisa Barcellos Rachel Brem Tom AlberJasper Rine Lior Pachter Bernie LoGenetics, Genomics & Dev Mathematics Director, Medical EthicsChair, Computational Biology Mol., Cell, Biol Department of MedicineUC Berkeley UC Berkeley UCSF 16