Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep phenotyping to aid identification of coding & non-coding rare disease variants

Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

Deep phenotyping to aid identification of coding & non-coding rare disease variants

  1. 1. Deep phenotyping to aid identification of coding & non-coding rare disease variants Melissa Haendel, PhD March 2017@monarchinit @ontowonka haendel@ohsu.edu
  2. 2. Acknowledgments Charite Max Schubach Sebastian Koehler Univ of Milan Giorgio Valentini RTI Jim Balhoff OHSU Kent Shefchek John Letaw Julie McMurry Nicole Vasilevsky Matt Brush Tom Conlin Dan Keith Genomics England/Queen Mary Damian Smedley Julius Jacobsen Jackson Laboratory Peter Robinson Stanford Shruti Marwaha Matthew Wheeler Euan Ashley Lawrence Berkeley Chris Mungall Suzanna Lewis Jeremy Nguyen Seth Carbon Garvan Tudor Groza https://monarchinitiative.org/page/team
  3. 3. The genome is sequenced, but... 3,398 OMIM Mendelian Diseases with no known genetic basis ? At least 120,000* ClinVar Variants with no known pathogenicity …we still don’t know very much about what it does *This is > twice what it was in 2016!
  4. 4. Prevailing clinical genomic pipelines leverage only a tiny fraction of the available data PATIENT EXOME / GENOME PATIENT CLINICAL PHENOTYPES PUBLIC GENOMIC DATA PUBLIC CLINICAL PHENOTYPE, DISEASE DATA POSSIBLE DISEASES DIAGNOSIS & TREATMENT PATIENT ENVIRONMENT PUBLIC ENVIRONMENT, DISEASE DATA PATIENT OMICS PHENOTYPES PUBLIC OMICS PHENOTYPES, CORRELATIONS Under-utilized data
  5. 5. The Human Phenotype Ontology Hyposmia Abnormality of globe location eyeball of camera-type eye sensory perception of smell Abnormal eye morphology Motor neuron atrophyDeeply set eyes motor neuronCL 34571 annotations in 22 species 157534 phenotype annotations 2150 phenotype annotations  11,813 phenotype terms  127,125 rare disease - phenotype annotations  136,268 common disease - phenotype annotations bit.ly/hpo-paper
  6. 6. Adding other species’ data helps fill knowledge gaps in human genome
  7. 7. More species = more coverage 19,008 78% 14,779 Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016 19,008 Even inclusion of just four species boosts phenotypic coverage of genes by 38% (5189%) Combined = 89% 19,008 2,195 7,544 7,235 = 16,974 (union of coverage in any species) 9,739 51% Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016
  8. 8. Phenotypic profile matching
  9. 9. Combining G2P data for variant prioritization Whole exome Remove off-target and common variants Variant score from allele freq and pathogenicity Phenotype score from phenotypic similarity PHIVE score to give final candidates Mendelian filters
  10. 10. Exomiser results for UDP diagnosed patients Inclusion of phenotype data improves variant prioritization In 60% of first 1000 genomes at GEL, Exomiser predicts top candidate In 86% of cases, Exomiser predicts within top 5
  11. 11. Example case solved by Exomiser Phenotypic profile Genes Heterozygous, missense mutation STIM-1 N/A Heterozygous, missense mutation STIM-1 N/A Stim1Sax/Sax Ranked STIM-1 variant maximally pathogenic based on cross-species G2P data, in the absence of traditional data sources http://bit.ly/exomiser
  12. 12. How to make sense of whole genomes …when there are 3.5 Billion base pairs and so little is known about non-coding regions? bit.ly/genomiser-2016
  13. 13. 1) Gather all evidence at each position (3.5B) • ancestral conservation • GC content • Max methylation, Acetylation, trimethylation levels • DNAse hypersensitivity • Enhancer attributes (robust, permissive) • # overlapping transcription factor binding sites • # rare variants (<0:5% AF) +/-500 nt • # common variants (> 0:5% AF) +/- 500 nt • Overlapping CNVs (ISCA , dbVAR, DGV) • (… 26 features in total) bit.ly/genomiser-2016
  14. 14. 2) Predict negative controls > 5% prevalence 14.7 M putative non-deleterious positions Highly conserved in ancestral genomes bit.ly/genomiser-2016
  15. 15. 3) Hand-curate positives from literature We curated 453 regulatory mutations judged as pathogenic by reported phenotypes (HPO) and other metrics bit.ly/genomiser-2016
  16. 16. 4) Address positive-negative imbalance 14.7 M Putative non- deleterious 453 Known regulatory mutations ? 36,000 negative examples are available for every positive one bit.ly/genomiser-2016
  17. 17. Synthetically oversample positives, & undersample negatives 14.7 M Putative non- deleterious 453 Known regulatory mutations 1) Partition negatives into 100 groups 2) Add to each negative group, all 453 known positives 3) In each group, oversample positives AND undersample negatives
  18. 18. Strongest predictors of deleterious mutation • Higher DNAse hypersensitivity • Greater methylation • Richer GC content • Higher ratio of rare:common variation • Higher conservation bit.ly/genomiser-2016
  19. 19. 4. Benchmark using synthetic genomes  10,235 simulated disease genomes using 1000 Genomes Data  Novel Regulatory Mendelian Mutation (ReMM) scoring method Genomiser +ReMM outperforms other methods/tools across non-coding region types bit.ly/genomiser-2016
  20. 20. www.monarchinitiative.org Leadership: Melissa Haendel, Chris Mungall, Peter Robinson, Tudor Groza, Damian Smedley, Sebastian Köhler, Julie McMurry Funding: NIH Office of Director: 2R24OD011883; NHGRI UDP: HHSN268201300036C, HHSN268201400093P;

    Be the first to comment

    Login to see the comments

  • ShokoKawatomo

    Mar. 27, 2017
  • handstad

    Mar. 28, 2017
  • WeitingLin8

    Jun. 2, 2017

Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.

Views

Total views

1,092

On Slideshare

0

From embeds

0

Number of embeds

36

Actions

Downloads

26

Shares

0

Comments

0

Likes

3

×