Here are a few things to consider about the patient's lower back pain over time:
- Acute vs chronic: Determine if the pain is a new onset (acute) or has been present long-term (chronic). The duration can provide clues.
- Progression: Note if the pain has gotten better, worse or stayed the same over time. Progression may indicate a more serious problem.
- Radiation: Document if the pain radiates anywhere (e.g. legs). Radiating pain can suggest nerve root involvement.
- Relieving/aggravating factors: Identify what makes the pain better or worse (e.g. activity, rest, position). This can help determine the
2. TODAY’S TALK
The computable phenotypic profile
Exome analysis for disease diagnosis
Crossing the species divide
What is GOOD phenotyping?
Chronological considerations
6. 6% OF THE GENERAL POPULATION SUFFERS FROM
A RARE DISORDER
6% of patients contacting the NIH Office of
Rare Disorders do not have a diagnosis
7. THE YET-TO-BE DIAGNOSED PATIENT
Known disorders not recognized during
prior evaluations?
Atypical presentation of known
disorders?
Combinations of several disorders?
Novel, unreported disorder?
8. THE CHALLENGE: INTERPRETATION OF
DISEASE CANDIDATES
?
What’s in the box?
How are
candidates
identified?
How do they
compare?
Prioritized
Candidates,
functional validation
C1
C2
C3
C4
...
Phenotypes
P1
P2
P3
…
Genotype
G1
G2
G3
G4
…
Pathogenicity, frequency,
protein interactions, gene
expression, gene
networks, epigenomics,
metabolomics….
Environments
E1, E2, E3, E4 …
9. MATCHING PATIENTS TO DISEASES
Patient
Disease X
Differential diagnosis with similar but non-matching phenotypes is difficult
Flat back of head Hypotonia
Abnormal skull morphology Decreased muscle mass
10. SEARCHING FOR PHENOTYPES USING
TEXT ALONE IS INSUFFICIENT
OMIM Query # Records
“large bone” 785
“enlarged bone” 156
“big bone” 16
“huge bones” 4
“massive bones” 28
“hyperplastic bones” 12
“hyperplastic bone” 40
“bone hyperplasia” 134
“increased bone growth” 612
13. DISEASE X IS A COLLECTION OF NODES
Each disease is associated with different phenotype nodes in the graph
Disease X
14. EACH DISEASE IS ANNOTATED WITH A
PHENOTYPIC PROFILE
Chromosome 21 Trisomy
Failure
to thrive
Umbilical
hernia
Broad
hands
Abnormal
ears
Flat
head
Down’s
Syndrome
15. PHENOTYPE “BLAST”: WHICH PHENTOYPIC
PROFILE IS GRAPHICALLY MOST SIMILAR?
Disease X
Patient
Disease Y
17. THE HUMAN PHENOTYPE ONTOLOGY
Used to annotate:
• Patients
• Disorders/Diseases
• Genotypes
• Genes
• Sequence variants
In human
Reduced pancreatic
beta cells
Abnormality of
pancreatic islet
cells
Abnormality of endocrine
pancreas physiology
Pancreatic islet
cell adenoma
Pancreatic islet cell
adenoma
Insulinoma
Multiple pancreatic
beta-cell adenomas
Abnormality of exocrine
pancreas physiology
Köhler et al. Nucleic Acids Res. 2014 Jan 1;42(1):D966-74.
18. WHY DO WE NEED THE HUMAN
PHENOTYPE ONTOLOGY?
Winnenburg and Bodenreider, ISMB PhenoDay, 2014
How does HPO relate to other clinical vocabularies?
19. EXOME ANALYSIS
Recessive, de novo filters
Remove off-target, common variants,
and variants not in known disease
causing genes
http://compbio.charite.de/PhenIX/
Target panel of 2741 known
Mendelian disease genes
Compare
phenotype
profiles using
data from:
HGMD, Clinvar,
OMIM, Orphanet
Zemojtel et al. Sci Transl Med 3 September 2014: Vol. 6,
Issue 252, p.252ra123
20. CONTROL PATIENTS WITH KNOWN
MUTATIONS
Inheritance Gene Average
Rank
AD ACVR1, ATL1, BRCA1, BRCA2, CHD7 (4),
CLCN7, COL1A1, COL2A1, EXT1, FGFR2 (2),
FGFR3, GDF5, KCNQ1, MLH1 (2), MLL2/KMT2D,
MSH2, MSH6, MYBPC3, NF1 (6), P63, PTCH1,
PTH1R (2), PTPN11 (2), SCN1A, SOS1, TRPS1,
TSC1, WNT10A
1.7
AR ATM, ATP6V0A2, CLCN1 (2), LRP5, PYCR1,
SLC39A4
5
X EFNB1, MECP2 (2), DMD, PHF6 1.8
52 patients with diagnosed rare diseases
21. PHENIX HELPED DIAGNOSE 11/40 PATIENTS
global developmental delay (HP:0001263)
delayed speech and language development (HP:0000750)
motor delay (HP:0001270)
proportionate short stature (HP:0003508)
microcephaly (HP:0000252)
feeding difficulties (HP:0011968)
congenital megaloureter (HP:0008676)
cone-shaped epiphysis of the phalanges of the hand (HP:0010230)
sacral dimple (HP:0000960)
hyperpigmentated/hypopigmentated macules (HP:0007441)
hypertelorism (HP:0000316)
abnormality of the midface (HP:0000309)
flat nose (HP:0000457)
thick lower lip vermilion (HP:0000179)
thick upper lip vermilion (HP:0000215)
full cheeks (HP:0000293)
short neck (HP:0000470)
22. WHAT ABOUT THE PATIENTS WE CAN’T
SOLVE?
HOW DO WE UNDERSTAND RARE
DISEASE ETIOLOGY AND DISCOVER
TREATMENTS?
30. lung
lung
lobular organ
parenchymatous
organ
solid organ
pleural sac
thoracic
cavity organ
thoracic
cavity
abnormal lung
morphology
abnormal respiratory
system morphology
Mammalian Phenotype
Mouse Anatomy
FMA
abnormal pulmonary
acinus morphology
abnormal pulmonary
alveolus morphology
lung
alveolus
organ system
respiratory
system
Lower
respiratory
tract
alveolar sac
pulmonary
acinus
organ system
respiratory
system
Human development
lung
lung bud
respiratory
primordium
pharyngeal region
PROBLEM: EACH ORGANISM USES
DIFFERENT VOCABULARIES
develops_from
part_of
is_a (SubClassOf)
surrounded_by
31. SOLUTION: BRIDGING SEMANTICS
Mungall et al. (2012). Genome Biology, 13(1), R5. doi:10.1186/gb-2012-13-1-r5
anatomical
structure
endoderm of
forgut
lung bud
lung
respiration organ
organ
foregut
alveolus
alveolus of lung
organ part
FMA:lung
MA:lung
endoderm
GO: respiratory
gaseous exchange
MA:lung
alveolus
FMA:
pulmonary
alveolus
is_a (taxon equivalent)
develops_from
part_of
is_a (SubClassOf)
capable_of
NCBITaxon: Mammalia
EHDAA:
lung bud
only_in_taxon
pulmonary acinus
alveolar sac
lung primordium
swim bladder
respiratory
primordium
NCBITaxon:
Actinopterygii
Köhler et al. (2014) F1000Research 2:30
Haendel et al. (2014) JBMS 5:21 doi:10.1186/2041-1480-5-21
32. => Web application for model phenotyping and G2P validation
PROBLEM: EACH SPECIES MAKES DIFFERENT
G2P ASSOCIATIONS
33. INTEGRATED GENTOYPE-2-
PHENOTYPE DATA IN MONARCH
Also in the system: Rat; IMPC; GO annotations; Coriell cell lines; OMIA; MPD; Yeast; CTD; GWAS;
Panther, Homologene orthologs; BioGrid interactions; Drugbank; AutDB; Allen Brain …157 sources
Coming soon: Animal QTLs for pig, cattle, chicken, sheep, trout, dog, horse
Species Data
source
Genes Genotypes Variants Phenotype
annotations
Diseases
mouse MGI 13,433 59,087 34,895 271,621
fish ZFIN 7,612 25,588 17,244 81,406
fly Flybase 27,951 91,096 108,348 267,900
worm Wormbase 23,379 15,796 10,944 543,874
human HPOA 112,602 7,401
human OMIM 2,970 4,437 3,651
human ClinVar 19,694 111,294 252,838 4,056
human KEGG 2,509 3,927 1,159
human ORPHANET 3,113 5,690 3,064
human CTD 7,414 23,320 4,912
34. EXOMISER: DIAGNOSING UDP_930 USING
A PHENOTYPICALLY SIMILAR MOUSE
Chronic acidosis
Neonatal
hypoglycemia
Ostopenia
Short stature
decreased
circulating
potassium level
Decreased
circulating
glucose level
Decreased bone
mineral density
decreased body
length
abnormal ion
homeostasis
Decreased
circulating
glucose level
Decreased
bone mineral
density
Short stature
UDP_930/29
phenotypes
Sms
tm1a(EUCOMM)Wtsi
Robinson et al. (2013). Genome Res, doi:10.1101/gr.160325.113
35. EXOMISER: COMBINING PHENOTYPIC
SIMILARITY WITH OTHER DATA
MED21
MAU2
MED8
MED26
Recurrent otitis
media
Spasticity
Esotropia
Cerebral palsy
Conductive
hearing
impairment
Limitation of joint
mobility
Strabismus
Hypertonia
Abnormality of
the middle ear
Abnormal joint
mobility
Strabismus
Abnormality of
central motor
function
UDP_2146/56
phenotypes
Brachmann-de
Lange syndrome
NIPBL
MED23
?
CCNC
Contractures of
the joints of the
lower limbs
Hypertonicity
CDK8
36. UDP CASES ANALYZED WITH
EXOMISER
=> Use of genotype, phenotype, PPI, and inheritance
together provide best prioritization
37. ANALYSIS OF UNSOLVED UDP CASES
4 families now have a diagnosis including, one novel
disease-gene association discovered: York Platelet
syndrome and STIM1
Strong candidates identified for 19 families that are
now undergoing functional validation through mouse
and zebrafish modeling
Several hundred UDP cases now being analyzed
using Exomiser and cross-species phenotype data
38. HOW DOES THE CLINICIAN KNOW THEY’VE
PROVIDED ENOUGH PHENOTYPING?
How many annotations…?
How many different categories?
How many within each?
39. Image credit: Viljoen and Beighton, J Med Genet. 1992
Schwartz-jampel Syndrome, Type I
Schwartz-jampel Syndrome,
Type I
Caused by Hspg2 mutation, a
proteoglycan
~100 phenotype annotations
40. EVALUATION METHOD
Create a variety of “derived” diseases
More general (depth)
Remove subset(s) (breadth)
Introduce noise
Assess the change in similarity between the derived
disease and it’s parent.
Ask questions:
Is the derived disease considered similar to
original?
…or more similar to a different disease?
Is it distinguishable beyond random?
Are there any specific factors that influence
similarity?
41. FINDING THE PHENOTYPE GRAPH IN
COMMON
The most specific phenotypic profile in common
42. METHOD: DERIVE BY CATEGORY
REMOVAL
Remove annotations that are subclasses of a
single high-level node
Repeat for each 1° subclass
46. SEMANTIC SIMILARITY ALGORITHMS ARE ROBUST
IN THE FACE OF MISSING INFORMATION
(avg) 92% of derived diseases are most-similar to
original disease
Severity of impact follows proportion of
phenotype
Similarity of Derived Disease to Original Derived Disease Profile Rank
47. METHOD: DERIVE BY LIFTING
Iteratively map each class to their direct
superclass(es)
Keep only leaf nodes
48. SEMANTIC SIMILARITY ALGORITHMS ARE
SENSITIVE TO SPECIFICITY OF INFORMATION
Severity of impact increases with more-general
phenotypes
Similarity of Derived Disease to Original Derived Disease Profile Rank
59. CONCLUSIONS
Phenotypic data can be represented using
ontologies to support improved comparisons
within and across species
For known disease-gene associations comparison
to human phenotype data is effective at variant
prioritization.
For unknown disease-gene associations the
expansion of phenotypic coverage using model
organisms greatly improves variant prioritization.
Phenotype breadth is recommended to buffer lack
of information, ALSO very specific phenotypes are
necessary to ensure quality matches
60. FUTURE WORK
Add additional variables to semantic similarity
algorithm – e.g. negation, environment, chronology
Validate existing animal models for recapitulation
of disease
Further characterization of organism-specific
phenotypes
Adding many more non-model organisms to the
analysis
61. ACKNOWLEDGMENTS
NIH-UDP
William Bone
Murat Sincan
David Adams
Amanda Links
Joie Davis
Neal Boerkoel
Cyndi Tifft
Bill Gahl
OHSU
Nicole Vasilesky
Matt Brush
Bryan Laraway
Shahim Essaid
Kent Shefchek
Garvan
Tudor Groza
Lawrence Berkeley
Nicole Washington
Suzanna Lewis
Chris Mungall
UCSD
Jeff Grethe
Chris Condit
Anita Bandrowski
Maryann Martone
U of Pitt
Chuck Boromeo
Vincent Agresti
Becky Boes
Harry Hochheiser
Sanger
Anika Oehlrich
Jules Jacobson
Damian Smedley
Toronto
Marta Girdea
Sergiu Dumitriu
Heather Trang
Bailey Gallinger
Orion Buske
Mike Brudno
JAX
Cynthia Smith
Charité
Sebastian Kohler
Sandra Doelken
Sebastian Bauer
Peter Robinson
Funding:
NIH Office of Director: 1R24OD011883
NIH-UDP: HHSN268201300036C, HHSN268201400093P