Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Deep phenotyping to aid identification
of coding & non-coding rare disease
variants
Melissa Haendel, PhD
March 2017@monarc...
Acknowledgments
Charite
Max Schubach
Sebastian Koehler
Univ of Milan
Giorgio Valentini
RTI
Jim Balhoff
OHSU
Kent Shefchek
...
The genome is sequenced, but...
3,398
OMIM
Mendelian Diseases with
no known genetic basis
?
At least 120,000*
ClinVar
Vari...
Prevailing clinical genomic pipelines
leverage only a tiny fraction of the available
data
PATIENT EXOME
/ GENOME
PATIENT C...
The Human Phenotype Ontology
Hyposmia
Abnormality of
globe location
eyeball of
camera-type eye
sensory
perception of smell...
Adding other species’ data
helps fill knowledge gaps in human genome
More species = more coverage
19,008
78%
14,779
Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 20...
Phenotypic profile matching
Combining G2P data for variant
prioritization
Whole exome
Remove off-target and
common variants
Variant score from allele
...
Exomiser results for UDP diagnosed
patients
Inclusion of phenotype data improves variant prioritization
In 60% of first 10...
Example case solved by Exomiser
Phenotypic
profile
Genes
Heterozygous,
missense mutation
STIM-1
N/A
Heterozygous,
missense...
How to make sense of whole genomes
…when there are 3.5 Billion base pairs and
so little is known about non-coding regions?...
1) Gather all evidence at each position
(3.5B)
• ancestral conservation
• GC content
• Max methylation, Acetylation, trime...
2) Predict negative controls
> 5% prevalence
14.7 M putative
non-deleterious
positions
Highly conserved in
ancestral genom...
3) Hand-curate positives from literature
We curated 453
regulatory
mutations judged
as pathogenic by
reported
phenotypes
(...
4) Address positive-negative imbalance
14.7 M
Putative non-
deleterious
453
Known regulatory mutations
?
36,000 negative e...
Synthetically oversample positives,
& undersample negatives
14.7 M
Putative non-
deleterious
453
Known regulatory mutation...
Strongest predictors of deleterious
mutation
• Higher DNAse hypersensitivity
• Greater methylation
• Richer GC content
• H...
4. Benchmark using synthetic genomes
 10,235 simulated disease genomes using 1000 Genomes Data
 Novel Regulatory Mendeli...
www.monarchinitiative.org
Leadership: Melissa Haendel, Chris Mungall, Peter Robinson,
Tudor Groza, Damian Smedley, Sebasti...
Upcoming SlideShare
Loading in …5
×
Upcoming SlideShare
Science in the open, what does it take?
Next
Download to read offline and view in fullscreen.

3

Share

Download to read offline

Deep phenotyping to aid identification of coding & non-coding rare disease variants

Download to read offline

Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Deep phenotyping to aid identification of coding & non-coding rare disease variants

  1. 1. Deep phenotyping to aid identification of coding & non-coding rare disease variants Melissa Haendel, PhD March 2017@monarchinit @ontowonka haendel@ohsu.edu
  2. 2. Acknowledgments Charite Max Schubach Sebastian Koehler Univ of Milan Giorgio Valentini RTI Jim Balhoff OHSU Kent Shefchek John Letaw Julie McMurry Nicole Vasilevsky Matt Brush Tom Conlin Dan Keith Genomics England/Queen Mary Damian Smedley Julius Jacobsen Jackson Laboratory Peter Robinson Stanford Shruti Marwaha Matthew Wheeler Euan Ashley Lawrence Berkeley Chris Mungall Suzanna Lewis Jeremy Nguyen Seth Carbon Garvan Tudor Groza https://monarchinitiative.org/page/team
  3. 3. The genome is sequenced, but... 3,398 OMIM Mendelian Diseases with no known genetic basis ? At least 120,000* ClinVar Variants with no known pathogenicity …we still don’t know very much about what it does *This is > twice what it was in 2016!
  4. 4. Prevailing clinical genomic pipelines leverage only a tiny fraction of the available data PATIENT EXOME / GENOME PATIENT CLINICAL PHENOTYPES PUBLIC GENOMIC DATA PUBLIC CLINICAL PHENOTYPE, DISEASE DATA POSSIBLE DISEASES DIAGNOSIS & TREATMENT PATIENT ENVIRONMENT PUBLIC ENVIRONMENT, DISEASE DATA PATIENT OMICS PHENOTYPES PUBLIC OMICS PHENOTYPES, CORRELATIONS Under-utilized data
  5. 5. The Human Phenotype Ontology Hyposmia Abnormality of globe location eyeball of camera-type eye sensory perception of smell Abnormal eye morphology Motor neuron atrophyDeeply set eyes motor neuronCL 34571 annotations in 22 species 157534 phenotype annotations 2150 phenotype annotations  11,813 phenotype terms  127,125 rare disease - phenotype annotations  136,268 common disease - phenotype annotations bit.ly/hpo-paper
  6. 6. Adding other species’ data helps fill knowledge gaps in human genome
  7. 7. More species = more coverage 19,008 78% 14,779 Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016 19,008 Even inclusion of just four species boosts phenotypic coverage of genes by 38% (5189%) Combined = 89% 19,008 2,195 7,544 7,235 = 16,974 (union of coverage in any species) 9,739 51% Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016
  8. 8. Phenotypic profile matching
  9. 9. Combining G2P data for variant prioritization Whole exome Remove off-target and common variants Variant score from allele freq and pathogenicity Phenotype score from phenotypic similarity PHIVE score to give final candidates Mendelian filters
  10. 10. Exomiser results for UDP diagnosed patients Inclusion of phenotype data improves variant prioritization In 60% of first 1000 genomes at GEL, Exomiser predicts top candidate In 86% of cases, Exomiser predicts within top 5
  11. 11. Example case solved by Exomiser Phenotypic profile Genes Heterozygous, missense mutation STIM-1 N/A Heterozygous, missense mutation STIM-1 N/A Stim1Sax/Sax Ranked STIM-1 variant maximally pathogenic based on cross-species G2P data, in the absence of traditional data sources http://bit.ly/exomiser
  12. 12. How to make sense of whole genomes …when there are 3.5 Billion base pairs and so little is known about non-coding regions? bit.ly/genomiser-2016
  13. 13. 1) Gather all evidence at each position (3.5B) • ancestral conservation • GC content • Max methylation, Acetylation, trimethylation levels • DNAse hypersensitivity • Enhancer attributes (robust, permissive) • # overlapping transcription factor binding sites • # rare variants (<0:5% AF) +/-500 nt • # common variants (> 0:5% AF) +/- 500 nt • Overlapping CNVs (ISCA , dbVAR, DGV) • (… 26 features in total) bit.ly/genomiser-2016
  14. 14. 2) Predict negative controls > 5% prevalence 14.7 M putative non-deleterious positions Highly conserved in ancestral genomes bit.ly/genomiser-2016
  15. 15. 3) Hand-curate positives from literature We curated 453 regulatory mutations judged as pathogenic by reported phenotypes (HPO) and other metrics bit.ly/genomiser-2016
  16. 16. 4) Address positive-negative imbalance 14.7 M Putative non- deleterious 453 Known regulatory mutations ? 36,000 negative examples are available for every positive one bit.ly/genomiser-2016
  17. 17. Synthetically oversample positives, & undersample negatives 14.7 M Putative non- deleterious 453 Known regulatory mutations 1) Partition negatives into 100 groups 2) Add to each negative group, all 453 known positives 3) In each group, oversample positives AND undersample negatives
  18. 18. Strongest predictors of deleterious mutation • Higher DNAse hypersensitivity • Greater methylation • Richer GC content • Higher ratio of rare:common variation • Higher conservation bit.ly/genomiser-2016
  19. 19. 4. Benchmark using synthetic genomes  10,235 simulated disease genomes using 1000 Genomes Data  Novel Regulatory Mendelian Mutation (ReMM) scoring method Genomiser +ReMM outperforms other methods/tools across non-coding region types bit.ly/genomiser-2016
  20. 20. www.monarchinitiative.org Leadership: Melissa Haendel, Chris Mungall, Peter Robinson, Tudor Groza, Damian Smedley, Sebastian Köhler, Julie McMurry Funding: NIH Office of Director: 2R24OD011883; NHGRI UDP: HHSN268201300036C, HHSN268201400093P;
  • WeitingLin8

    Jun. 2, 2017
  • handstad

    Mar. 28, 2017
  • ShokoKawatomo

    Mar. 27, 2017

Whole-exome sequencing has revolutionized disease research, but many cases remain unsolved because ~100-1000 candidates remain after removing common or non-pathogenic variants. We present Genomiser to prioritize coding and non-coding variants by leveraging phenotype data encoded with the Human Phenotype Ontology and a curated database of non-coding Mendelian variants. Genomiser is able to identify causal regulatory variants as the top candidate in 77% of simulated whole genomes.

Views

Total views

1,116

On Slideshare

0

From embeds

0

Number of embeds

36

Actions

Downloads

26

Shares

0

Comments

0

Likes

3

×