Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Haendel clingenetics.3.14.14


Published on

  • Be the first to comment

Haendel clingenetics.3.14.14

  1. 1. Expanding the Clinical Phenotype Space with Semantics and Model Systems Melissa Haendel March 14th, 2014 Updates in Clinical Genetics 2014
  2. 2. Outline  Issues in candidate prioritization  Computational techniques for comparing phenotypes  Undiagnosed Disease Program semantic phenotyping  Minimum phenotype requirements  Tools leveraging phenotypes
  3. 3. The Challenge: Interpretation of Disease Candidates ?  What’s in the box?  How are candidates identified?  How do they compare? Prioritized Candidates, Models, functional validation M1 M2 M3 M4 ... Phenotypes P1 P2 P3 … Genotype info G1 G2 G3 G4 … Pathogenicity, frequency, p rotein interactions, gene expression, gene networks, epigenomics, m etabolomics….
  4. 4. Candidate gene prioritization Phenot ypic inf or mat ionGenet ic inf or mat ion gene/ gene pr oduct Inf o Phenotypes collected for individual patients Sequences from an individual,family,or related group Candidate interpretation Human sequence reference sequences (e.g.reference sequence,1K genome data, genomic location) Community phenotype data (e.g. literature MODS,KOMP2,OMIM, EHRs,GWAS,ClinVar,disease specific repositories,etc.) Pathway Functional (GO) Gene expression, OMICS data Protein-Protein Interactions Enrichment analysis (e.g.GATACA,Galaxy) Combined variant + phenotype candidate reporting(e.g.Exomizer) BiomedicalKnowledgeIndividual'sInformation Phenotypic comparison methods Variant calling (e.g.GATK) Pathogenicity /Impact calling (e.g. VAAST,SIFT) Orthologs Network module analysis
  5. 5. B6.Cg-Alms1foz/fox/J increased weight, adipose tissue volume, glucose homeostasis altered ALSM1(NM_015120.4) [c.10775delC] + [-] GENOTYPE PHENOTYPE obesity, diabetes mellitus, insulin resistance increased food intake, hyperglycemia, insulin resistance kcnj11c14/c14; insrt143/+(AB) Models recapitulate various phenotypic aspects of disease ?
  6. 6. OMIM Query # Records “large bone” 785 “enlarged bone” 156 “big bone” 16 “huge bones” 4 “massive bones” 28 “hyperplastic bones” 12 “hyperplastic bone” 40 “bone hyperplasia” 134 “increased bone growth” 612 Searching for phenotypes using text alone is insufficient
  7. 7. Problem: Clinical and model phenotypes are described differently
  8. 8. “Expanding” the phenotypic coverage of the human genome 0% 20% 40% 60% 80% 100% %humancodinggenes OMIM OMIM+GWA S Ortholog only Human+Ortholog Human only Five model organisms (mouse, zebrafish, fly, yeast, rat) provide almost 80% phenotypic coverage of the human genome
  9. 9. How can we take advantage this model organism phenotype data?
  10. 10. Outline  Issues in candidate prioritization  Computational techniques for comparing phenotypes  Undiagnosed Disease Program semantic phenotyping  Minimum phenotype requirements  Tools leveraging phenotypes
  11. 11. Using ontologies to compare phenotypes across species Washington, N. L., Haendel, M. A., Mungall, C. J., Ashburner, M., Westerfield, M., & Lewis, S. E. (2009). Linking Human Diseases to Animal Models Using Ontology-Based Phenotype Annotation. PLoS Biol, 7(11). doi:10.1371/journal.pbio.1000247
  12. 12. What is an ontology? A set of logically defined, inter-related terms used to annotate data Use of common or logically related terms across databases enables integration Relationships between terms allow annotations to be grouped in scientifically meaningful ways Reasoning software enables computation of inferred knowledge Groups of annotations can be compared using semantic similarity algorithms
  13. 13. An ontology provides the logical basis of classification Any sense organ that functions in the detection of smell is an olfactory sense organ sense organ capable_of some detection of smell olfactory sense organ
  14. 14. nose sense organ nose capable_of some detection of smell sense organ capable_of some detection of smell olfactory sense organ nose => These are necessary and sufficient conditions Classifying
  15. 15. Representating phenotypes
  16. 16. Human Phenotype Ontology Used to annotate: • Patients • Disorders • Genotypes • Genes • Sequence variants In human Reduced pancreatic beta cells Abnormality of pancreatic islet cells Abnormality of endocrine pancreas physiology Pancreatic islet cell adenoma Pancreatic islet cell adenoma Insulinoma Multiple pancreatic beta-cell adenomas Abnormality of exocrine pancreas physiology Köhler et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
  17. 17. Mammalian Phenotype Ontology Smith et al. (2005). The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol, 6(1). doi:10.1186/gb-2004-6-1-r7 Used to annotate and query: • Genotypes • Alleles • Genes In mice abnormal pancreatic beta cell mass abnormal pancreatic beta cell morphology abnormal pancreatic islet morphology abnormal endocrine pancreas morphology abnormal pancreatic beta cell differentiation abnormal pancreatic alpha cell morphology abnormal pancreatic alpha cell differentiation abnormal pancreatic alpha cell number
  18. 18. Post-composed models of phenotype annotation Entity Anatomy: head Anatomy: heart Anatomy: ventral mandibular arch Gene Ontology: swim bladder inflation Quality Small size Edematous Thick Arrested
  19. 19. A human phenotype example Abnormality of the eye Vitreous hemorrhage Abnormal eye morphology Abnormality of the cardiovascular system Abnormal eye physiology Hemorrhage of the eye Internal hemorrhage Abnormality of the globe Abnormality of blood circulation
  20. 20. lung lung lobular organ parenchymatous organ solid organ pleural sac thoracic cavity organ thoracic cavity abnormal lung morphology abnormal respiratory system morphology Mammalian Phenotype Mouse Anatomy FMA abnormal pulmonary acinus morphology abnormal pulmonary alveolus morphology lung alveolus organ system respiratory system Lower respiratory tract alveolar sac pulmonary acinus organ system respiratory system Human development lung lung bud respiratory primordium pharyngeal region Problem: Data silos develops_from part_of is_a (SubClassOf) surrounded_by
  21. 21. Solution: bridging semantics Mungall, C. J., Torniai, C., Gkoutos, G. V., Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5 anatomical structure endoderm of forgut lung bud lung respiration organ organ foregut alveolus alveolus of lung organ part FMA:lung MA:lung endoderm GO: respiratory gaseous exchange MA:lung alveolus FMA: pulmonary alveolus is_a (taxon equivalent) develops_from part_of is_a (SubClassOf) capable_of NCBITaxon: Mammalia EHDAA: lung bud only_in_taxon pulmonary acinus alveolar sac lung primordium swim bladder respiratory primordium NCBITaxon: Actinopterygii Köhler et al. (2014) Construction and accessibility of a cross-species phenotype ontology along with gene annotations for biomedical research F1000Research 2014, 2:30
  22. 22. Phenotype representation requires more than “phenotype ontologies” glucose metabolism (GO:0006006 ) Gene/protein function data glucose (CHEBI:172 34) Metabolomics, t oxicogenomics data Disease & phenotype data type II diabetes mellitus (DOID:9352) pyruvate (CHEBI:153 61) Disease Gene Ontology Chemical pancreatic beta cell (CL:0000169) transcriptomic data Cell
  23. 23. OWLsim: Phenotype similarity across patients or organisms Unstable posture Constipation Neuronal loss in Substantia Nigra Shuffling gait Resting tremors REM disorder Hyposmia poor rotarod performance decreased gut peristalsis axon degeneration decreased stride length sterotypic behavior abnormal EEG failure to find food abnormal coordination abnormal digestive physiology CNS neuron degeneration abnormal locomotion abnormal motor function sleep disturbance abnormal olfaction
  24. 24. Outline  Issues in candidate prioritization  Computational techniques for comparing phenotypes  Undiagnosed Disease Program semantic phenotyping  Minimum phenotype requirements  Tools leveraging phenotypes
  25. 25. General exome analysis Single Exome Remove off-target and common variants, filter on predicted deleteriousness, candidate gene strategies Prioritize based on known genes, allele frequency, and pathogenicity Homozygous recessive, X- linked, De novo (if trio)
  26. 26. Undiagnosed Disease Program exome analysis Family exome data Prioritize based on alignment quality, allele frequency, predicted deleterious, and PubMed Filter using SNP chip data, Mendelian models of inheritance and Population frequency
  27. 27. exome analysis Recessive, De novo filters Remove off-target, common variants, and variants not in known disease causing genes Zemojtelet al., manuscript submitted
  28. 28. Remove off-target and common variants Recessive, De novo filters es/databases/exomiser/ Robinson et al. Exomiser exome analysis
  29. 29. Current UDP analysis with semantic phenotyping Family Exome Data Combined Score Phenotype Data Filter using SNP chip data, Mendelian models of inheritance, and population frequency
  30. 30. Benchmarking 1092 unaffected exomes 28,516 disease associated variants 100,000 simulated exomes  Annotate variants  Remove off-target, syn and common(>1% MAF) variants (plus optional inheritance model filtering)  Prioritize based on combined score
  31. 31. 0 10 20 30 40 50 60 70 80 90 100 All diseases Autosomal Dominant Autosomal Recessive (hom) Autosomal Recessive (compound het) %exomeswithdiseasegeneas tophit Variant Phenotypic relevance PHIVE Phenotype and variant data synergistically improve exome interpretation
  32. 32. Results  Correct gene as top scoring hit in 68.3% of exomes out of an average of 272 post-filtering candidate genes  Improvement of between 1.8 and 5.1 fold in the percentage of candidate genes correctly ranked in first place compared to just using pathogenicity and frequency data  Shows utility of structured phenotype data for computational analysis
  33. 33. UDP Experiment UDP Diploid Aligned Cohort VCF file 18 families Phenotype profiles Mendelian filtered files (per family) Mendelian Filters Exomiser PhenIX Phenotype only VCF files with phenotype and variant scores (per family)
  34. 34. Top de novo candidates for patient 2543 Patient Exomiser Phenotype only PhenIX UDP2543 STIM1, CYP2D6, MUC5B ITGA7, PLEC, STIM1, PTGS1, TTN STIM1, RB1, DLEC1, CHRNB4, MUC5B, REPIN1, NBPF8, GPRIN3, TMEFF1, FLT3LG, OSM, FZD10, MUC12 Gene Variant MAF(ESP or 1000g) Consequence Predicted pathogenicity: SIFT, PolyPhen, MutTaster (0-1) STIM1 chr11:g.4045175A>T [0/1] 0% p.I115T 1
  35. 35. UDP2543: phenotypic similarity Patient Stim1 het mouse OMIM:612783 (IMMUNODEFICIENCY 10) - hom STIM1 mutations OMIM:160565 (MYOPATHY, TUBULAR AGGREGATE) - het STIM1 mutations Impaired platelet aggregation abnormal platelet activation Thrombocytopenia Thrombocytopenia decreased platelet cell number Thrombocytopenia Myopathy Myopathy Myopathy Generalized hypotonia Muscular hypotonia Proximal muscle weakness Petechiae increased bleeding time Autoimmune hemolytic anemia Delayed gross motor development Epistaxis increased bleeding time Gower sign
  36. 36. STIM1
  37. 37. Suspected genetic disease DRG sequencing Deep phenotyping Top ranked candidates Clinical rounds Exclude candidate gene Sanger validation Cosegregation studies Diagnosis Fails Passes Inconsistent Consistent Reconsider short list Choose best candidate Variant Analysis HGMD MAF (dbSNP, ESP) ClinVar Annotation sources: Predicted pathogenicity Variant class Location in DRG target region Prediction criteria: Computational Phenotype Analysis HPO Semantic similarity Mode of inheritance OMIM, Orphanet, MGI... Annotation sources: Ontology: Prediction criteria: MP Proposed workflow for undiagnosed diseases
  38. 38. What constitutes an adequate phenotype annotation for an undiagnosed patient?
  39. 39. Defining minimum phenotype standard: 1. Is the annotation specificity similar to or better than the corpus of available phenotype data? 2. Is the number of annotations/patient similar or better? 3. How does the ontology and annotation set differ across anatomical systems in terms of granularity? Does this change specificity requirements for phenotypic profiles? 4. How does use of NOT annotations help further specify the uniqueness of an undiagnosed patient? 5. How do onset, temporal ordering, and severity affect specificity?
  40. 40. UDP phenotype annotation metrics UDP annotations have a similar Information content (IC) and a larger number of average annotations per disease/patient
  41. 41. Anatomical annotation distribution in the corpus Nervous system, skeletal system, and immune system is highest => these categories require greater specificity and numbers of annotations
  42. 42. Annotation specificity meter What about common traits, like blue eyes or acne?
  43. 43. Making the patient phenotype profiles as good as can be Total requests from UDP 614 Examples Number of requests assigned to HPO terms 423 Chronic limb pain -> limb pain Number of terms that need consideration by UDP 145 Expressive language -> delay? Increase? Abnormal? Number of requests that belong in other parts of the patient record 68 Abnormal aCGH 12q21.1- 12q.2 (662 kb duplication) paternal origin -> move to genotype information portion of the record It is a community effort to contribute requests to the ontologies and quality profiling helps make our tools work better for everyone
  44. 44. Limitations and ongoing work  Adding negation to the algorithm  Temporal ordering of phenotypes  Leveraging severity, expressivity, and penetrance data
  45. 45. Additional tools leveraging structured phenotype data
  46. 46. The Monarch system
  47. 47. Monarch phenotype data Species Source Unique genotypes/va riants Disease/phen otype associations Mouse MGI 53,573 406,618 Zebrafish ZFIN 14,703 75,698 C. elegans Wormbase 116,106 411,154 Fruit fly Flybase 98,596 265,329 Human OMIM 26,372 27,798 Human Orphanet 2,872 5,095 Human ClinVar 62,437 178,424 Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD; Yeast; CTD; GWAS; Panther, Homologene orthologs; BioGrid interactions; Drugbank; AutDB; Allen Brain …157 sources to date Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse
  48. 48. ModelCompare: How do the models recapitulate the disease? Late-onset Parkinson’s Phenotypes Mouse Phenotypes
  49. 49. Slc6a3 Dbh Tyrosine metabolism Slc6a3 Slc18a2 Uchl1 Uchl3 Snca Mfn2 Cx IV Cox8a Th Late-onset Parkinson’s Phenotypes (subset) Bradykinesia Depression Dysphagia Lewy bodies Network phenotype distribution
  50. 50. Slc6a3 Dbh Tyrosine metabolism Slc6a3 Slc18a2 Uchl1 Uchl3 Snca Mfn2 Cx IV Cox8a Th Late-onset Parkinson’s Phenotypes (subset) Bradykinesia Depression Dysphagia Lewy bodies Abnormal gait ataxia paralysis Bradykinesia Abnormal locomotion Abnormality of central motor function Phenotypes in common
  51. 51. Finding collaborators for functional validation Patient Phenotype profile Phenotyping experts
  52. 52. Exome Walker: Network based exploration of phenotypically similar diseases Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet. 2008 Apr;82(4):949-58. doi: 10.1016/j.ajhg.2008.02.013. Bare Lymphocyte Syndrome Type 1 Protein-Interaction Network  Exploits vicinity in the protein interaction network between phenotypically related diseases and uses this to rank exome candidates  Large boost in rankings of candidate genes using 250 disease gene-families  Prototype version online, manuscript in preparation
  53. 53. PhenoViz: Integrate all human, mouse, and fish data to understand CNVs Desktop application for differential diagnostics in CNVs  Explain manifestations of CNV diseases based on genes contained in CNV E.g., Supravalcular aortic stenosis in Williams syndrome can be explained by haploinsufficiency for elastin  Double the number of explanations using model data Doelken, Köhler, et al. (2013) Dis Model Mech 6:358-72
  54. 54. Conclusions  Cross-species phenotype data can be used to perform semantic similarity  Structured phenotype data for rare and undiagnosed disease patients can aid candidate evaluation  We are experimenting with these methods for UDP patient phenotypes to aid candidate prioritization, identify models, explore mechanisms, and find collaborators
  55. 55. NIH-UDP William Bone Murat Sincan David Adams Amanda Links David Draper Neal Boerkoel Cyndi Tifft Bill Gahl OHSU Nicole Vasilesky Matt Brush Lawrence Berkeley Nicole Washington Suzanna Lewis Chris Mungall UCSD Amarnath Gupta Jeff Grethe Anita Bandrowski Maryann Martone U of Pitt Chuck Boromeo Jeremy Espino Harry Hochheiser Acknowledgments Sanger Anika Oehlrich Jules Jacobson Damian Smedley Toronto Marta Girdea Sergiu Dumitriu Mike Brudno JAX Cynthia Smith Charité Sebastian Kohler Sandra Doelken Sebastian Bauer Peter Robinson Funding: NIH Office of Director: 1R24OD011883 NIH-UDP: HHSN268201300036C