SlideShare a Scribd company logo
ca·gey ˈkā-jē adjective
                                 1: hesitant about committing oneself;
                                 2a: wary of being trapped or deceived;
                                 2b: marked by cleverness




CAGI (ˈkā-jē)
Critical Assessment of Genome Interpretation
A community experiment to evaluate phenotype prediction

Reece Hart (with Steven Brenner and John Moult)
QB3 / Center for Computational Biology
UC Berkeley
reece@berkeley.edu

Human Variome Project Meeting
Paris 2010-05-12
The Significance of
 “Variants of Uncertain Significance”




“VUS – Variant of uncertain significance. A variation
in a genetic sequence whose association with
disease risk is unknown. Also called variant of
uncertain significance, variant of unknown
significance, and unclassified variant.”
http://www.cancer.gov/cancertopics/genetics-terms-alphalist


                                                              2
The long tail of rare diseases.



“A rare disease typically affects a patient
population estimated at fewer than 200,000 in
the U.S. There are more than 6,000 rare
diseases known today and they affect an
estimated 25 million persons in the U.S.”
NIH Office of Rare Diseases Research
http://rarediseases.info.nih.gov/




                                                3
Interpretation of Unclassified Variants
    a sampling of responses from genetic counselors


➢   Routinely used          ➢   Selectively used
    ●   dbSNP                   ●   PharmGKB
    ●   OMIM                    ●   LSDBs
    ●   GeneReviews             ●   Domain prediction
    ●   PolyPhen                ●   Structure impact
    ●   SIFT                        analysis
    ●   PubMed                  ●   Homology
    ●   Mailing lists




                                                        4
Genome Variant Impact Prediction Tools
                            an incomplete list

Program              URL
Align-GVGD           http://agvgd.iarc.fr/
AutoMute             http://proteins.gmu.edu/automute/
CUPSAT               http://cupsat.tu-bs.de/
Dmutant              http://sparks.informatics.iupui.edu/hzhou/mutation.html
nsSNPAnalyzer        http://snpanalyzer.uthsc.edu/
PantherPSEC          http://www.pantherdb.org/tools/csnpScoreForm.jsp
PhD-SNP              http://gpcr.biocomp.unibo.it/~emidio/PhD-SNP/PhD-SNP.htm
Pmut                 http://mmb2.pcb.ub.es:8080/PMut/

PolyPhen             http://coot.embl.de/PolyPhen/
SIFT                 http://sift.jcvi.org/
SNAP                 http://cubic.bioc.columbia.edu/services/snap/
SNP Function Pred.   http://www.ensembl.org/ [N.B. login required]
SNPinfo / FuncPred   http://snpinfo.niehs.nih.gov/snpfunc.htm
SNPs3D               http://snps3d.org/
UMD-predictor        http://www.umd.be/
                                                                            5
Current methods are the tip of the iceberg.


                                                 m

                                                 C
protein       non-protein repeats   indels   epigenetics
transcripts   transcripts




                      ~99%
  ~1%




                                                       6
Objectively Assessing Computational Predictions

 ➢   CASP – Structure prediction
 ➢   CAPRI – Protein-ligand docking
 ➢   EGASP – Encode Gene Annotation
 ➢   RGASP – RNA-Seq mapping
 ➢   DREAM – network model assessment



       Data Acquisition

                              Publication


                          The Prediction Window
                          ~1-12 months when unpublished
                          high-quality data are available
                                                            7
CAGI – Critical Assessment of Genome Interpretation
A community assessment of the state-of-the-art in phenotype prediction.



➢   Follow the successful critical
    assessment framework:

    ●   Solicit pre-publication genotype-
        phenotype associations

    ●   Provide genomic data to predictors
        and collect their predictions

    ●   Assess predictions against revealed
        annotations, mechanisms, and
        phenotypes

                                                                          8
Sample Prediction Categories

  Molecular                 Cellular              Organismal
      A                        A                       A
          T                        T                       T




 MTHFR mutants –        Breast Cancer –         PGP100 –
 Yeast growth           Segregation of rare     Unpublished
 rates with various     variants among          phenotypes from
 MTHFR mutations        2500 cases and          PGP100 project.
 and [folate].          controls.
 (Jasper Rine)          (Sean Tavtigian)        (George Church)

Please contact us if you have pre-publication genotype-phenotype
association data.
                                                                   9
Census of Molecular Mechanisms
possible mechanisms of variant impact for WTCCC SNVs




             Wellcome Trust Case Control Consortium Nature. 2007;447(7145):661-78. 10
Contributors, Predictors, Assessors
                an incomplete list of participants




Gad Getz           Sean Tavtigian    Rachel Karchin   Jasper Rine




Pauline Ng         Marc Greenblatt   Mauno Vihinen    George Church
                                                                      11
Sample CAGI Timeline

       Dates are for illustration – exact dates have not been set.
05-24
05-31
06-07
06-14
06-21
06-28




08-23
08-30
09-06
09-13
09-20
09-27




11-22
11-29
12-06
12-13
12-20
12-27
05-03
05-10
05-17




07-05
07-12
07-19
07-26
08-02
08-09
08-16




10-04
10-11
10-18
10-25
11-01
11-08
11-15




01-03
01-10
01-17
01-24
01-31
                Data Gathering

                               Prediction Season

                                                   Assessment

Key Dates
      ▲ finalize data sources                                   ▲ workshop
           ▲ release prospectus / rules
           ▲ open participant registration




                                                                             12
CAGI Summary
➢   CAGI will:
    ●   objectively assess phenotype prediction methods
    ●   inform future research directions
    ●   introduce researchers in diverse fields

➢   CAGI is being planned for the end of 2010
    or early 2011.

➢   Now seeking data contributors, assessors,
    and predictors.

➢   Feedback is sought!       reece@berkeley.edu

➢   See http://genomecommons.org/cagi for more
    information.                               13
14
The Genome Commons:
A Flagship Project Within QB3




     10 km



                                15
Program in Translational Genomics

                                                                             Rasmus Nielseno
                                                                             Michael I. Jordan
                                                                             Ian Holmes
                                                                             Kimmen Sjölander
                                                                             Yun Song
                                                                             Monty Slatkin
                                                                             Terry Speed
Steven Hart
Reece Brenner                  Sandrine Dudoit    Robert Nussbaum            Mark van der Laan
Plant & Mol. Biology
Chief Scientist                Biostatistics      Chief, Medical Genetics    Richard Karp
UC Berkeley
UC Berkeley                    UC Berkeley        UCSF                       Bernd Sturmfels
                                                                             Steven Evans
                                                                             Elizabeth Purdom
                                                                             Haiyan Huang
                                                                             Peter Bickel
                                                                             Susan Marqusee
                                                                             Michael Eisen
                                                                             Lisa Barcellos
                                                                             Rachel Brem
                                                                             Tom Alber
Jasper Rine                    Lior Pachter       Bernie Lo
Genetics, Genomics & Dev       Mathematics        Director, Medical Ethics
Chair, Computational Biology   Mol., Cell, Biol   Department of Medicine
UC Berkeley                    UC Berkeley        UCSF




                                                                                             16

More Related Content

Viewers also liked

Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Maria Eskevich
 
Focus on spoken content in multimedia retrieval
Focus on spoken content in multimedia retrievalFocus on spoken content in multimedia retrieval
Focus on spoken content in multimedia retrieval
Maria Eskevich
 
Audio/Video Search: Why? What? How?
Audio/Video Search: Why? What? How?Audio/Video Search: Why? What? How?
Audio/Video Search: Why? What? How?
Maria Eskevich
 
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Maria Eskevich
 
Building a clinical genome interpretation services company
Building a clinical genome interpretation services companyBuilding a clinical genome interpretation services company
Building a clinical genome interpretation services company
Reece Hart
 
The Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesThe Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment Discrepancies
Reece Hart
 
Video Hyperlinking (LNK) Task at TRECVid 2016
Video Hyperlinking (LNK) Task at TRECVid 2016Video Hyperlinking (LNK) Task at TRECVid 2016
Video Hyperlinking (LNK) Task at TRECVid 2016
Maria Eskevich
 

Viewers also liked (7)

Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
 
Focus on spoken content in multimedia retrieval
Focus on spoken content in multimedia retrievalFocus on spoken content in multimedia retrieval
Focus on spoken content in multimedia retrieval
 
Audio/Video Search: Why? What? How?
Audio/Video Search: Why? What? How?Audio/Video Search: Why? What? How?
Audio/Video Search: Why? What? How?
 
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
Comparing Retrieval Effectiveness of Alternative Content Segmentation Methods...
 
Building a clinical genome interpretation services company
Building a clinical genome interpretation services companyBuilding a clinical genome interpretation services company
Building a clinical genome interpretation services company
 
The Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment DiscrepanciesThe Clinical Significance of Transcript Alignment Discrepancies
The Clinical Significance of Transcript Alignment Discrepancies
 
Video Hyperlinking (LNK) Task at TRECVid 2016
Video Hyperlinking (LNK) Task at TRECVid 2016Video Hyperlinking (LNK) Task at TRECVid 2016
Video Hyperlinking (LNK) Task at TRECVid 2016
 

Similar to HVP Critical Assessment of Genome Interpretation

Trends in Annotation of Genomic Data
Trends in Annotation of Genomic DataTrends in Annotation of Genomic Data
Trends in Annotation of Genomic Data
biobase
 
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Sage Base
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
Stephen Turner
 
EpiGene request for partnership
EpiGene request for partnershipEpiGene request for partnership
EpiGene request for partnership
Евгений Дубинин
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical Research
Jessica Minnier
 
Target Validation / Biochemical and Cellular Assay Development
Target Validation / Biochemical and Cellular Assay Development Target Validation / Biochemical and Cellular Assay Development
Target Validation / Biochemical and Cellular Assay Development
OSUCCC - James
 
Friend NAS 2013-01-10
Friend NAS 2013-01-10Friend NAS 2013-01-10
Friend NAS 2013-01-10
Sage Base
 
Ngs update
Ngs updateNgs update
Ngs update
Telemed930
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
GenomeInABottle
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Nathan Olson
 
CCSN_Husereau_2_Nov.pdf
CCSN_Husereau_2_Nov.pdfCCSN_Husereau_2_Nov.pdf
CCSN_Husereau_2_Nov.pdf
Canadian Cancer Survivor Network
 
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Open-Source Bioinformatics for Data Scientists with Amanda SchierzOpen-Source Bioinformatics for Data Scientists with Amanda Schierz
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Jessica Willis
 
Schierz ODSC Meetup pdf
Schierz ODSC Meetup pdfSchierz ODSC Meetup pdf
Schierz ODSC Meetup pdf
Sheamus McGovern
 
PadminiNarayanan-Intro-2018.pptx
PadminiNarayanan-Intro-2018.pptxPadminiNarayanan-Intro-2018.pptx
PadminiNarayanan-Intro-2018.pptx
DESMONDEZIEKE1
 
Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12
Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12
Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12
Sage Base
 
Stephen Friend Food & Drug Administration 2011-07-18
Stephen Friend Food & Drug Administration 2011-07-18Stephen Friend Food & Drug Administration 2011-07-18
Stephen Friend Food & Drug Administration 2011-07-18
Sage Base
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...
Vall d'Hebron Institute of Research (VHIR)
 
CLARITY BPA: a Novel Approach to study EDCs
CLARITY BPA: a Novel Approach to study EDCsCLARITY BPA: a Novel Approach to study EDCs
CLARITY BPA: a Novel Approach to study EDCs
DES Daughter
 
Personalized Medicine and the Omics Revolution by Professor Mike Snyder
Personalized Medicine and the Omics Revolution by Professor Mike SnyderPersonalized Medicine and the Omics Revolution by Professor Mike Snyder
Personalized Medicine and the Omics Revolution by Professor Mike Snyder
The Hive
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance
 

Similar to HVP Critical Assessment of Genome Interpretation (20)

Trends in Annotation of Genomic Data
Trends in Annotation of Genomic DataTrends in Annotation of Genomic Data
Trends in Annotation of Genomic Data
 
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 
EpiGene request for partnership
EpiGene request for partnershipEpiGene request for partnership
EpiGene request for partnership
 
Math, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical ResearchMath, Stats and CS in Public Health and Medical Research
Math, Stats and CS in Public Health and Medical Research
 
Target Validation / Biochemical and Cellular Assay Development
Target Validation / Biochemical and Cellular Assay Development Target Validation / Biochemical and Cellular Assay Development
Target Validation / Biochemical and Cellular Assay Development
 
Friend NAS 2013-01-10
Friend NAS 2013-01-10Friend NAS 2013-01-10
Friend NAS 2013-01-10
 
Ngs update
Ngs updateNgs update
Ngs update
 
Giab for jax long read 190917
Giab for jax long read 190917Giab for jax long read 190917
Giab for jax long read 190917
 
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...
 
CCSN_Husereau_2_Nov.pdf
CCSN_Husereau_2_Nov.pdfCCSN_Husereau_2_Nov.pdf
CCSN_Husereau_2_Nov.pdf
 
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
Open-Source Bioinformatics for Data Scientists with Amanda SchierzOpen-Source Bioinformatics for Data Scientists with Amanda Schierz
Open-Source Bioinformatics for Data Scientists with Amanda Schierz
 
Schierz ODSC Meetup pdf
Schierz ODSC Meetup pdfSchierz ODSC Meetup pdf
Schierz ODSC Meetup pdf
 
PadminiNarayanan-Intro-2018.pptx
PadminiNarayanan-Intro-2018.pptxPadminiNarayanan-Intro-2018.pptx
PadminiNarayanan-Intro-2018.pptx
 
Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12
Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12
Stephen Friend Koo Foundation / Sun Yat-Sen Cancer Center 2012-03-12
 
Stephen Friend Food & Drug Administration 2011-07-18
Stephen Friend Food & Drug Administration 2011-07-18Stephen Friend Food & Drug Administration 2011-07-18
Stephen Friend Food & Drug Administration 2011-07-18
 
Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...Identification of pathological mutations from the single-gene case to exome p...
Identification of pathological mutations from the single-gene case to exome p...
 
CLARITY BPA: a Novel Approach to study EDCs
CLARITY BPA: a Novel Approach to study EDCsCLARITY BPA: a Novel Approach to study EDCs
CLARITY BPA: a Novel Approach to study EDCs
 
Personalized Medicine and the Omics Revolution by Professor Mike Snyder
Personalized Medicine and the Omics Revolution by Professor Mike SnyderPersonalized Medicine and the Omics Revolution by Professor Mike Snyder
Personalized Medicine and the Omics Revolution by Professor Mike Snyder
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
 

More from Reece Hart

HGVS 2015 poster: hgvs, uta, variantanalyzer
HGVS 2015 poster: hgvs, uta, variantanalyzerHGVS 2015 poster: hgvs, uta, variantanalyzer
HGVS 2015 poster: hgvs, uta, variantanalyzerReece Hart
 
Clinical significance of transcript alignment discrepancies gne - 20141016
Clinical significance of transcript alignment discrepancies   gne - 20141016Clinical significance of transcript alignment discrepancies   gne - 20141016
Clinical significance of transcript alignment discrepancies gne - 20141016
Reece Hart
 
Invitae PSB 2014 poster
Invitae PSB 2014 posterInvitae PSB 2014 poster
Invitae PSB 2014 poster
Reece Hart
 
AWS Life Sciences
AWS Life SciencesAWS Life Sciences
AWS Life Sciences
Reece Hart
 
ASHG 2012 Poster
ASHG 2012 PosterASHG 2012 Poster
ASHG 2012 Poster
Reece Hart
 
Bio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsBio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsReece Hart
 
Introduction to and Applications of Unison, an Open Source Database for Targe...
Introduction to and Applications of Unison, an Open Source Database for Targe...Introduction to and Applications of Unison, an Open Source Database for Targe...
Introduction to and Applications of Unison, an Open Source Database for Targe...Reece Hart
 
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningReece Hart
 
A Tour of Research Computing at Genentech
A Tour of Research Computing at GenentechA Tour of Research Computing at Genentech
A Tour of Research Computing at GenentechReece Hart
 
Integrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from UnisonIntegrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from UnisonReece Hart
 
Unison: An Integrated Platform for Computational Biology Discovery
Unison: An Integrated Platform for Computational Biology DiscoveryUnison: An Integrated Platform for Computational Biology Discovery
Unison: An Integrated Platform for Computational Biology DiscoveryReece Hart
 
Mining for Novel TNF Ligands
Mining for Novel TNF LigandsMining for Novel TNF Ligands
Mining for Novel TNF LigandsReece Hart
 

More from Reece Hart (12)

HGVS 2015 poster: hgvs, uta, variantanalyzer
HGVS 2015 poster: hgvs, uta, variantanalyzerHGVS 2015 poster: hgvs, uta, variantanalyzer
HGVS 2015 poster: hgvs, uta, variantanalyzer
 
Clinical significance of transcript alignment discrepancies gne - 20141016
Clinical significance of transcript alignment discrepancies   gne - 20141016Clinical significance of transcript alignment discrepancies   gne - 20141016
Clinical significance of transcript alignment discrepancies gne - 20141016
 
Invitae PSB 2014 poster
Invitae PSB 2014 posterInvitae PSB 2014 poster
Invitae PSB 2014 poster
 
AWS Life Sciences
AWS Life SciencesAWS Life Sciences
AWS Life Sciences
 
ASHG 2012 Poster
ASHG 2012 PosterASHG 2012 Poster
ASHG 2012 Poster
 
Bio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsBio-IT 2010 Genome Commons
Bio-IT 2010 Genome Commons
 
Introduction to and Applications of Unison, an Open Source Database for Targe...
Introduction to and Applications of Unison, an Open Source Database for Targe...Introduction to and Applications of Unison, an Open Source Database for Targe...
Introduction to and Applications of Unison, an Open Source Database for Targe...
 
Unison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic miningUnison: Enabling easy, rapid, and comprehensive proteomic mining
Unison: Enabling easy, rapid, and comprehensive proteomic mining
 
A Tour of Research Computing at Genentech
A Tour of Research Computing at GenentechA Tour of Research Computing at Genentech
A Tour of Research Computing at Genentech
 
Integrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from UnisonIntegrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from Unison
 
Unison: An Integrated Platform for Computational Biology Discovery
Unison: An Integrated Platform for Computational Biology DiscoveryUnison: An Integrated Platform for Computational Biology Discovery
Unison: An Integrated Platform for Computational Biology Discovery
 
Mining for Novel TNF Ligands
Mining for Novel TNF LigandsMining for Novel TNF Ligands
Mining for Novel TNF Ligands
 

HVP Critical Assessment of Genome Interpretation

  • 1. ca·gey ˈkā-jē adjective 1: hesitant about committing oneself; 2a: wary of being trapped or deceived; 2b: marked by cleverness CAGI (ˈkā-jē) Critical Assessment of Genome Interpretation A community experiment to evaluate phenotype prediction Reece Hart (with Steven Brenner and John Moult) QB3 / Center for Computational Biology UC Berkeley reece@berkeley.edu Human Variome Project Meeting Paris 2010-05-12
  • 2. The Significance of “Variants of Uncertain Significance” “VUS – Variant of uncertain significance. A variation in a genetic sequence whose association with disease risk is unknown. Also called variant of uncertain significance, variant of unknown significance, and unclassified variant.” http://www.cancer.gov/cancertopics/genetics-terms-alphalist 2
  • 3. The long tail of rare diseases. “A rare disease typically affects a patient population estimated at fewer than 200,000 in the U.S. There are more than 6,000 rare diseases known today and they affect an estimated 25 million persons in the U.S.” NIH Office of Rare Diseases Research http://rarediseases.info.nih.gov/ 3
  • 4. Interpretation of Unclassified Variants a sampling of responses from genetic counselors ➢ Routinely used ➢ Selectively used ● dbSNP ● PharmGKB ● OMIM ● LSDBs ● GeneReviews ● Domain prediction ● PolyPhen ● Structure impact ● SIFT analysis ● PubMed ● Homology ● Mailing lists 4
  • 5. Genome Variant Impact Prediction Tools an incomplete list Program URL Align-GVGD http://agvgd.iarc.fr/ AutoMute http://proteins.gmu.edu/automute/ CUPSAT http://cupsat.tu-bs.de/ Dmutant http://sparks.informatics.iupui.edu/hzhou/mutation.html nsSNPAnalyzer http://snpanalyzer.uthsc.edu/ PantherPSEC http://www.pantherdb.org/tools/csnpScoreForm.jsp PhD-SNP http://gpcr.biocomp.unibo.it/~emidio/PhD-SNP/PhD-SNP.htm Pmut http://mmb2.pcb.ub.es:8080/PMut/ PolyPhen http://coot.embl.de/PolyPhen/ SIFT http://sift.jcvi.org/ SNAP http://cubic.bioc.columbia.edu/services/snap/ SNP Function Pred. http://www.ensembl.org/ [N.B. login required] SNPinfo / FuncPred http://snpinfo.niehs.nih.gov/snpfunc.htm SNPs3D http://snps3d.org/ UMD-predictor http://www.umd.be/ 5
  • 6. Current methods are the tip of the iceberg. m C protein non-protein repeats indels epigenetics transcripts transcripts ~99% ~1% 6
  • 7. Objectively Assessing Computational Predictions ➢ CASP – Structure prediction ➢ CAPRI – Protein-ligand docking ➢ EGASP – Encode Gene Annotation ➢ RGASP – RNA-Seq mapping ➢ DREAM – network model assessment Data Acquisition Publication The Prediction Window ~1-12 months when unpublished high-quality data are available 7
  • 8. CAGI – Critical Assessment of Genome Interpretation A community assessment of the state-of-the-art in phenotype prediction. ➢ Follow the successful critical assessment framework: ● Solicit pre-publication genotype- phenotype associations ● Provide genomic data to predictors and collect their predictions ● Assess predictions against revealed annotations, mechanisms, and phenotypes 8
  • 9. Sample Prediction Categories Molecular Cellular Organismal A A A T T T MTHFR mutants – Breast Cancer – PGP100 – Yeast growth Segregation of rare Unpublished rates with various variants among phenotypes from MTHFR mutations 2500 cases and PGP100 project. and [folate]. controls. (Jasper Rine) (Sean Tavtigian) (George Church) Please contact us if you have pre-publication genotype-phenotype association data. 9
  • 10. Census of Molecular Mechanisms possible mechanisms of variant impact for WTCCC SNVs Wellcome Trust Case Control Consortium Nature. 2007;447(7145):661-78. 10
  • 11. Contributors, Predictors, Assessors an incomplete list of participants Gad Getz Sean Tavtigian Rachel Karchin Jasper Rine Pauline Ng Marc Greenblatt Mauno Vihinen George Church 11
  • 12. Sample CAGI Timeline Dates are for illustration – exact dates have not been set. 05-24 05-31 06-07 06-14 06-21 06-28 08-23 08-30 09-06 09-13 09-20 09-27 11-22 11-29 12-06 12-13 12-20 12-27 05-03 05-10 05-17 07-05 07-12 07-19 07-26 08-02 08-09 08-16 10-04 10-11 10-18 10-25 11-01 11-08 11-15 01-03 01-10 01-17 01-24 01-31 Data Gathering Prediction Season Assessment Key Dates ▲ finalize data sources ▲ workshop ▲ release prospectus / rules ▲ open participant registration 12
  • 13. CAGI Summary ➢ CAGI will: ● objectively assess phenotype prediction methods ● inform future research directions ● introduce researchers in diverse fields ➢ CAGI is being planned for the end of 2010 or early 2011. ➢ Now seeking data contributors, assessors, and predictors. ➢ Feedback is sought! reece@berkeley.edu ➢ See http://genomecommons.org/cagi for more information. 13
  • 14. 14
  • 15. The Genome Commons: A Flagship Project Within QB3 10 km 15
  • 16. Program in Translational Genomics Rasmus Nielseno Michael I. Jordan Ian Holmes Kimmen Sjölander Yun Song Monty Slatkin Terry Speed Steven Hart Reece Brenner Sandrine Dudoit Robert Nussbaum Mark van der Laan Plant & Mol. Biology Chief Scientist Biostatistics Chief, Medical Genetics Richard Karp UC Berkeley UC Berkeley UC Berkeley UCSF Bernd Sturmfels Steven Evans Elizabeth Purdom Haiyan Huang Peter Bickel Susan Marqusee Michael Eisen Lisa Barcellos Rachel Brem Tom Alber Jasper Rine Lior Pachter Bernie Lo Genetics, Genomics & Dev Mathematics Director, Medical Ethics Chair, Computational Biology Mol., Cell, Biol Department of Medicine UC Berkeley UC Berkeley UCSF 16