Computing on phenotypes across scale
and species
Chris Mungall, PhD
AMP 2015
@monarchinit @chrismungall
Patient
Genome
/Exome
Diagnosis,
treatment
filtering
****
** ***** ****
Genomic data
Patient
Genome
/Exome
Improved
Diagnosis,
treatment
filtering
*
** ***** ****
Phenome
Gene
to
Phenotype
Database
Genomic data
Hyperkeratosis,
hearing impairment,
…
Obstacles to phenome-based
interpretation
• Building a comprehensive phenomic
database
– Multiple disparate sources:
• Human Genes, Variants, etc databases
• Orthologous genes in model organisms
• Phenotype Search and Matching
• How do utilize phenotypes in a variant filtering pipeline?
• How do we match phenotypes in different species?
• How much difference does phenotype make?
monarchinitiative.org
Interpretation requires prior
knowledge of gene-phenotype
effects
monarchinitiative.org
Model organisms supply ~50%
phenotypic knowledge to human genes
Other organisms provide deeper
molecular pathological perspective
SNCA (Hsap) phenotypes
• Mental deterioration
• Urinary urgency
• Lewy bodies
• Tremor
• Urinary Urgency
• Substantia nigra gliosis
• …
Snca (Mmus) phenotypes
• Retinal dopaminergic
neuron degeration
(OMIM)
(MGI)
• Abnormal synaptic
dopamine release
• Alpha-synuclein inclusion
body
• Dopamine neuron loss
• …
Transgenic Snca (Dmel)
phenotypes
(FlyBase)
Monarch Portal: linking human
diseases to model systems
• One stop shop for
gene-phenotype data
and analysis:
• Humans
• Models
– Data:
• Genes
• Variants
• Complex genotypes
• Phenotypes
• Disease
http://monarchinitiative.org/
Mungall, C. J., Washington, N. L., Nguyen-Xuan, J., Condit, C.,
Smedley, D., Köhler, S., … Haendel, M. A. (2015). Use of Model
Organism and Disease Databases to Support Matchmaking for
Human Disease Gene Discovery. Human Mutation, 36(10), 979–
84. doi:10.1002/humu.22857
monarchinitiative.org
Building the knowledge base
+ in-house curation
Phenotypes
From 60 metazoan
species
How do we search phenome
databases?
• Given a patient
phenotypic profile
• What are the relevant
genes implicated in…
– Humans?
– Model systems?
Patient
Phenome
Gene
<->
Phenotype
Database
Hyperkeratosis,
hearing impairment,
…
Candidate
genes
KRT2
GJB2
monarchinitiative.org
We have a common
computable language for
sequence data….
ATCTTAGCACGTTAC…
OR g.241T>c
….not so much for phenotypes
monarchinitiative.org
Ulcerated
paws
Palmoplantar
hyperkeratosis
Thick hand skin
monarchinitiative.org
Ulcerated
paws
Palmoplantar
hyperkeratosis
Thick hand skin
Abnormal
autopod skin
Ontologies:
Concepts
Inter-related in a graph
Hyperkeratosis
monarchinitiative.org
Ulcerated
paws
Palmoplantar
hyperkeratosis
Thick hand skin
Abnormal
autopod skin
id: HP:0000972
Synonyms: “Thick palms and soles”
Def: “Hyperkeratosis affecting the palm of
the hand and the sole of the foot”
Köhler, S., Doelken, S. C., Mungall, … Robinson, P. N. (2013). The
Human Phenotype Ontology project: linking molecular biology and
disease through phenotype data. Nucleic Acids Res., Kohler, S.(1),
gkt1026–. doi:10.1093/nar/gkt1026
OMIM:309560
OMIM:613989
…
MP:0000578
Ctsk
Ntrk1
Lamc2
monarchinitiative.org
Ulcerated
paws
Palmoplantar
hyperkeratosis
Thick hand skin
Abnormal
autopod skin
id: HP:0000972
Synonyms: “Thick palms and soles”
Def: “Hyperkeratosis affecting the palm of
the hand and the sole of the foot”
OMIM:309560
OMIM:613989
…
?
Ctsk
Ntrk1
Lamc2
MP:0000578
monarchinitiative.org
paw
skin
hand
autopod =
epidermis
stratum
corneum
Mungall, C. J., Torniai, C., Gkoutos, G. V, Lewis, S. E., & Haendel, M.
A. (2012). Uberon, an integrative multi-species anatomy ontology.
Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
Keratinization
(GO)
Uberon bridges multiple ontologies
monarchinitiative.org
Ulcerated
paws
Palmoplantar
hyperkeratosis
Thick hand skin
Abnormal
autopod skin
id: HP:0000972
Synonyms: “Thick palms and soles”
Def: “Hyperkeratosis affecting the palm of
the hand and the sole of the foot”
Ctsk
Ntrk1
Lamc2
MRCA
Phenotype
What model organism genes are relevant
for my phenotype?
monarchinitiative.org
Smedley, D., Oellrich, A., Köhler, S., Ruef, B., Westerfield, M., Robinson, P., … Mungall, C. (2013). PhenoDigm: analyzing curated
annotations to associate animal models with human diseases. Database : The Journal of Biological Databases and Curation,
2013, bat025. doi:10.1093/database/bat025
Multi-phenotype search
Smedley, D., Oellrich, A., Köhler, S., Ruef, B., Westerfield, M., Robinson, P., … Mungall, C. (2013). PhenoDigm: analyzing curated
annotations to associate animal models with human diseases. Database : The Journal of Biological Databases and Curation,
2013, bat025. doi:10.1093/database/bat025
monarchinitiative.org
PhenoGrid phenotype comparison
widget
Patient phenotypes
Compare patients with:
 Other patients
 Known diseases
 Models
http://monarchinitiative.
org/page/phenogrid
PHenotypic Interpretation of
Variants in Exomes
Whole exome
Remove off-target and
common variants
Variant score from allele
freq and pathogenicity
Phenotype score from phenotypic similarity
(hi)PHIVE score to give final candidates
Mendelian filters
https://www.sanger.ac.uk/reso
urces/software/exomiser/
monarchinitiative.org
Adding phenotype improves variant
interpretation
Robinson, P., Kohler, S., Oellrich, A., Wang, K., Mungall, C., Lewis, S.
E., … Köhler, S. (2013). Improved exome prioritization of disease
genes through cross species phenotype comparison. Genome
Research. doi:10.1101/gr.160325.113
monarchinitiative.org
Patient diagnosis example
Deleteriousness Phenotype Score
P
ID
Gen
e
MT P2 S Clinical Pheno Matching Pheno gene P Var ES Ran
k
92
9
SMS 1.00 0.99 0.00 Ostopenia Decreased BMD Sms 0.4 1.00 0.89 1/25
Short stature Decreased body length
Neonatal hyoglycemia Decreased circulating glucose
levels
acidosis Decreased circulating
potassion levels
Decreased body weight Decreased body weight
Bone, W. P. et al. Computational evaluation of exome sequence
data using human and model organism phenotypes improves
diagnostic efficiency. Genet. Med. in press, (2015)
monarchinitiative.org
From exomes to genomes
Smedley D. et al, under review
Building up a massive phenomic
database
• Initial efforts
• Manual curation of OMIM records
• Expert biocurators and clinicians
• Lag between publication and phenotype capture
• How are we scaling up?
• Phenotypes at time of publication
• Working with patient registries
• Natural Language Processing
• Integration with Gene Ontology curation
Each case
Report
Associated
With HPO
profile
Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision medicine.
Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372
Beyond mendelian phenotypes
• First pass
• Mendelian or ‘rare’ diseases
• Can we include a broader definition of
‘phenotype’
• Quantitative traits, e.g. hippocampus volume
• Common disease phenotypes
• Cancer
monarchinitiative.org
Groza, T., … Robinson, P. N. (2015). The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease. The
American Journal of Human Genetics, 1–14. doi:10.1016/j.ajhg.2015.05.020
Mining pubmed for phenotypes
F-Score: 45%
Building causal molecular
pathological models
http://create.monarchinitiative.org
http://noctua.berkeleybop.org
Conclusions
• Phenotypes are crucial for precision medicine
• Variant interpretation needs more than genome data
• Methods of incorporating phenotypes are evolving
• We need all the organisms
• The Monarch Portal integrates and organizes
gene-phenotype data
• Ontologies make phenotypes computable
• Depth and breadth of structured phenotype data is
growing
Monarch team
Lawrence Berkeley
Chris Mungall
Nicole Washington
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
Charité
Peter Robinson
Sebastian Kohler
Max Schubach
Tomasz Zemojtel
U of Pittsburgh
Harry Hochheiser
Mike Davis
Joe Zhou
OHSU
Melissa Haendel
Nicole Vasilesky
Matt Brush
Kent Shefchek
Julie McMurry
Mark Engelstead
Sanger Institute
Damian Smedley
Jules Jacobson
Garvan
Tudor Groza
Craig McNamara
Edwin Zhang
Funding:
NIH Office of Director: 1R24OD011883
NIH-UDP: HHSN268201300036C, HHSN268201400093P
http://monarchinitiative.org
From phenomes to exposomes
• Environmental context
• Microbiome
• Drugs
Buttigieg, P. L., Morrison, N., Smith, B., Mungall, C. J., & Lewis, S. E. (2013). The environment ontology: contextualising biological
and biomedical entities. Journal of Biomedical Semantics, 4(1), 43. doi:10.1186/2041-1480-4-43

Computing on Phenotypes AMP 2015

  • 1.
    Computing on phenotypesacross scale and species Chris Mungall, PhD AMP 2015 @monarchinit @chrismungall
  • 2.
  • 3.
  • 4.
    Obstacles to phenome-based interpretation •Building a comprehensive phenomic database – Multiple disparate sources: • Human Genes, Variants, etc databases • Orthologous genes in model organisms • Phenotype Search and Matching • How do utilize phenotypes in a variant filtering pipeline? • How do we match phenotypes in different species? • How much difference does phenotype make?
  • 5.
  • 6.
    monarchinitiative.org Model organisms supply~50% phenotypic knowledge to human genes
  • 7.
    Other organisms providedeeper molecular pathological perspective SNCA (Hsap) phenotypes • Mental deterioration • Urinary urgency • Lewy bodies • Tremor • Urinary Urgency • Substantia nigra gliosis • … Snca (Mmus) phenotypes • Retinal dopaminergic neuron degeration (OMIM) (MGI) • Abnormal synaptic dopamine release • Alpha-synuclein inclusion body • Dopamine neuron loss • … Transgenic Snca (Dmel) phenotypes (FlyBase)
  • 8.
    Monarch Portal: linkinghuman diseases to model systems • One stop shop for gene-phenotype data and analysis: • Humans • Models – Data: • Genes • Variants • Complex genotypes • Phenotypes • Disease http://monarchinitiative.org/ Mungall, C. J., Washington, N. L., Nguyen-Xuan, J., Condit, C., Smedley, D., Köhler, S., … Haendel, M. A. (2015). Use of Model Organism and Disease Databases to Support Matchmaking for Human Disease Gene Discovery. Human Mutation, 36(10), 979– 84. doi:10.1002/humu.22857
  • 9.
    monarchinitiative.org Building the knowledgebase + in-house curation Phenotypes From 60 metazoan species
  • 10.
    How do wesearch phenome databases? • Given a patient phenotypic profile • What are the relevant genes implicated in… – Humans? – Model systems? Patient Phenome Gene <-> Phenotype Database Hyperkeratosis, hearing impairment, … Candidate genes KRT2 GJB2
  • 11.
    monarchinitiative.org We have acommon computable language for sequence data…. ATCTTAGCACGTTAC… OR g.241T>c ….not so much for phenotypes
  • 12.
  • 13.
    monarchinitiative.org Ulcerated paws Palmoplantar hyperkeratosis Thick hand skin Abnormal autopodskin Ontologies: Concepts Inter-related in a graph Hyperkeratosis
  • 14.
    monarchinitiative.org Ulcerated paws Palmoplantar hyperkeratosis Thick hand skin Abnormal autopodskin id: HP:0000972 Synonyms: “Thick palms and soles” Def: “Hyperkeratosis affecting the palm of the hand and the sole of the foot” Köhler, S., Doelken, S. C., Mungall, … Robinson, P. N. (2013). The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res., Kohler, S.(1), gkt1026–. doi:10.1093/nar/gkt1026 OMIM:309560 OMIM:613989 … MP:0000578 Ctsk Ntrk1 Lamc2
  • 15.
    monarchinitiative.org Ulcerated paws Palmoplantar hyperkeratosis Thick hand skin Abnormal autopodskin id: HP:0000972 Synonyms: “Thick palms and soles” Def: “Hyperkeratosis affecting the palm of the hand and the sole of the foot” OMIM:309560 OMIM:613989 … ? Ctsk Ntrk1 Lamc2 MP:0000578
  • 16.
    monarchinitiative.org paw skin hand autopod = epidermis stratum corneum Mungall, C.J., Torniai, C., Gkoutos, G. V, Lewis, S. E., & Haendel, M. A. (2012). Uberon, an integrative multi-species anatomy ontology. Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5 Keratinization (GO) Uberon bridges multiple ontologies
  • 17.
    monarchinitiative.org Ulcerated paws Palmoplantar hyperkeratosis Thick hand skin Abnormal autopodskin id: HP:0000972 Synonyms: “Thick palms and soles” Def: “Hyperkeratosis affecting the palm of the hand and the sole of the foot” Ctsk Ntrk1 Lamc2 MRCA Phenotype What model organism genes are relevant for my phenotype?
  • 18.
    monarchinitiative.org Smedley, D., Oellrich,A., Köhler, S., Ruef, B., Westerfield, M., Robinson, P., … Mungall, C. (2013). PhenoDigm: analyzing curated annotations to associate animal models with human diseases. Database : The Journal of Biological Databases and Curation, 2013, bat025. doi:10.1093/database/bat025 Multi-phenotype search
  • 19.
    Smedley, D., Oellrich,A., Köhler, S., Ruef, B., Westerfield, M., Robinson, P., … Mungall, C. (2013). PhenoDigm: analyzing curated annotations to associate animal models with human diseases. Database : The Journal of Biological Databases and Curation, 2013, bat025. doi:10.1093/database/bat025
  • 20.
    monarchinitiative.org PhenoGrid phenotype comparison widget Patientphenotypes Compare patients with:  Other patients  Known diseases  Models http://monarchinitiative. org/page/phenogrid
  • 21.
    PHenotypic Interpretation of Variantsin Exomes Whole exome Remove off-target and common variants Variant score from allele freq and pathogenicity Phenotype score from phenotypic similarity (hi)PHIVE score to give final candidates Mendelian filters https://www.sanger.ac.uk/reso urces/software/exomiser/
  • 22.
    monarchinitiative.org Adding phenotype improvesvariant interpretation Robinson, P., Kohler, S., Oellrich, A., Wang, K., Mungall, C., Lewis, S. E., … Köhler, S. (2013). Improved exome prioritization of disease genes through cross species phenotype comparison. Genome Research. doi:10.1101/gr.160325.113
  • 23.
    monarchinitiative.org Patient diagnosis example DeleteriousnessPhenotype Score P ID Gen e MT P2 S Clinical Pheno Matching Pheno gene P Var ES Ran k 92 9 SMS 1.00 0.99 0.00 Ostopenia Decreased BMD Sms 0.4 1.00 0.89 1/25 Short stature Decreased body length Neonatal hyoglycemia Decreased circulating glucose levels acidosis Decreased circulating potassion levels Decreased body weight Decreased body weight Bone, W. P. et al. Computational evaluation of exome sequence data using human and model organism phenotypes improves diagnostic efficiency. Genet. Med. in press, (2015)
  • 24.
    monarchinitiative.org From exomes togenomes Smedley D. et al, under review
  • 25.
    Building up amassive phenomic database • Initial efforts • Manual curation of OMIM records • Expert biocurators and clinicians • Lag between publication and phenotype capture • How are we scaling up? • Phenotypes at time of publication • Working with patient registries • Natural Language Processing • Integration with Gene Ontology curation
  • 26.
    Each case Report Associated With HPO profile Robinson,P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372
  • 27.
    Beyond mendelian phenotypes •First pass • Mendelian or ‘rare’ diseases • Can we include a broader definition of ‘phenotype’ • Quantitative traits, e.g. hippocampus volume • Common disease phenotypes • Cancer
  • 28.
    monarchinitiative.org Groza, T., …Robinson, P. N. (2015). The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease. The American Journal of Human Genetics, 1–14. doi:10.1016/j.ajhg.2015.05.020 Mining pubmed for phenotypes F-Score: 45%
  • 30.
    Building causal molecular pathologicalmodels http://create.monarchinitiative.org http://noctua.berkeleybop.org
  • 31.
    Conclusions • Phenotypes arecrucial for precision medicine • Variant interpretation needs more than genome data • Methods of incorporating phenotypes are evolving • We need all the organisms • The Monarch Portal integrates and organizes gene-phenotype data • Ontologies make phenotypes computable • Depth and breadth of structured phenotype data is growing
  • 32.
    Monarch team Lawrence Berkeley ChrisMungall Nicole Washington Suzanna Lewis Jeremy Nguyen Seth Carbon Charité Peter Robinson Sebastian Kohler Max Schubach Tomasz Zemojtel U of Pittsburgh Harry Hochheiser Mike Davis Joe Zhou OHSU Melissa Haendel Nicole Vasilesky Matt Brush Kent Shefchek Julie McMurry Mark Engelstead Sanger Institute Damian Smedley Jules Jacobson Garvan Tudor Groza Craig McNamara Edwin Zhang Funding: NIH Office of Director: 1R24OD011883 NIH-UDP: HHSN268201300036C, HHSN268201400093P http://monarchinitiative.org
  • 34.
    From phenomes toexposomes • Environmental context • Microbiome • Drugs Buttigieg, P. L., Morrison, N., Smith, B., Mungall, C. J., & Lewis, S. E. (2013). The environment ontology: contextualising biological and biomedical entities. Journal of Biomedical Semantics, 4(1), 43. doi:10.1186/2041-1480-4-43

Editor's Notes

  • #2 Phenotype-Gene Associations in Variant Interpretation
  • #3 Even with increased genomic data e.g. EXAC, it can still be hard to pinpoint the causative variant, or to be sure the real variant hasn’t been filtered
  • #6 Human: GWAS, OMIM, clinvar Orthology via PANTHER v9 When put together, they bring the phenotypic coverage of human genes (either directly or inferred via orthology) up to nearly 80%. That is A LOT of coverage. How can we better tap that?
  • #7 Human: GWAS, OMIM, clinvar Orthology via PANTHER v9 When put together, they bring the phenotypic coverage of human genes (either directly or inferred via orthology) up to nearly 80%. That is A LOT of coverage. How can we better tap that?
  • #8 More molecular. Just as the exome is only an incomplete view of the genome, only part of the phenome is observed and measured
  • #12 We can quantify distance due to shared ancestry. Draw trees.
  • #15 Mention HPOA
  • #16 Mention HPOA
  • #17 Backbone ontology. Bridges anatomical and pathological levels
  • #33 There are a lot of people who have contributed to this work over many years. 
  • #38 If we include bridging ontologies, we can unify diseases across sources AND phenotypes across sources and organisms.
  • #39 Represent Human as a biological subject Represent diseases as collections of nodes in the graph 3. Interoperable with other bioinformatics resources and leverage modern semantic standards