Global phenotypic data sharing standards to maximize diagnostic discovery

Global Phenotypic Data Sharing Standards
to Maximize Diagnostic Discovery
Melissa Haendel, PhD and Sebastian Köhler, PhD
RD-Action workshop
April 26th and 27th, Brussels

Talk outline
 About HPO
 Semantic similarity
 Leveraging basic research data
 Exome analysis and disease discovery
 HPO-based tools
 Phenotype data standards for exchange

What do we mean by phenotype?
 = Phenotypic abnormality = clinical feature
 Constellation/Pattern clinical features
defines a disease:
– [Disease X]... is a rare developmental disorder defined by
the combination of aplasia cutis congenita of the scalp
vertex and terminal transverse limb defects. In addition,
vascular anomalies such as cutis marmorata
telangiectatica ... are recurrently seen.
 (Yes, this is a simplification)

Starting point: OMIM
 Clinical Synopsis (CS) section
 Free text phenotypic description
 Very expressive
Online Mendelian Inheritance in Man database

(Un)Controlled Vocabularies
 Not designed to be easily machine interpretable
 Spelling problems, acronyms, etc.
 Homonyms:
... fibrillation ...
fibrillation ≠ fibrillation
= ventricular fibrillation= muscle fibrillation

Why you should care
OMIM Query Number of Results
large bones 264
large bone 785
enlarged bones 87
enlarged bone 156
big bones 16
huge bones 4
massive bones 28
hyperplastic bones 12
hyperplastic bone 40
bone hyperplasia 134
increased bone growth 612

Motivation
 HPO started in 2008
 Goal: computer-interpretable clinical features!
 Reliable information extraction from databases based on clinical
features
 Compute similarity between diseases based on clinical features
 Compute similarity between patients based on clinical features
 Compute similarity between patients and diseases based on clinical
features
 Interoperability with basic research to improve diagnostic discovery
 Easy to use
 Freely available

The Human Phenotype Ontology
(HPO)
Description of phenotypic abnormalities (or clinical features) in
humans
abnormality of the
nervous system
neurofibrillary
tangles
cerebral inclusion
bodies
gait ataxia
gait
disturbance
ataxia
phenotypic
abnormality
incoordination
abnormality of
movement
abnormality of the
central nervous
system
This is a term
CS of OMIM:0815
CS of OMIM:1234
Neurofibrillary tangles
may be present
Paired helical filaments

The Human Phenotype Ontology (HPO)
 Synonyms merged into one term
 Textual definitions for each term
id: HP:0002185
name: Neurofibrillary tangles
def: Pathological protein
aggregates formed by
hyperphosphorylation of a
microtubule-associated protein
known as tau, causing it to
aggregate in an insoluble form.
[HPO:sdoelken]
synonym: Neurofibrillary tangles
may be present EXACT []
synonym: Paired helical filaments
EXACT []
abnormality of the
nervous system
neurofibrillary
tangles
cerebral inclusion
bodies
gait ataxia
gait
disturbance
ataxia
phenotypic
abnormality
abnormality of
movement
abnormality of the
central nervous
system
incoordination

The Human Phenotype Ontology
(HPO)
 Semantic relations
(’subclass of’, ‘is a’)
 From top to bottom,
terms get more specific
abnormality of the
nervous system
neurofibrillary
tangles
cerebral inclusion
bodies
gait ataxia
gait
disturbance
ataxia
phenotypic
abnormality
abnormality of
movement
abnormality of the
central nervous
system
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a
is a incoordination

Computable phenotype definitions of
disease
HPO Terms are used to annotate (describe) diseases
E.g. neurofibrillary tangles is used to annotate Alzheimer Disease:
Orphanet + Monarch:
 ~124,000 annotations of 7,700 rare diseases from OMIM,
Orphanet, DECIPHER
 ~133,000 annotations of 3,145 common diseases
Köhler et al. https://doi.org/10.1093/nar/gkw1039
OMIM:0815 OMIM:1234
Neurofibrillary tangles
may be present
Paired helical filaments

Why HPO? Existing clinical vocabularies don’t
adequately cover phenotypic descriptions
Winnenburg and Bodenreider, 2014
0
10
20
30
40
50
60
70
80
90
100
HPO UMLS SNOMED CT CHV MedDRA MeSH NCIT ICD10 OMIM
Percentcoverage
 LDDB (✓)
 Orphanet (✓) (Use HPO directly)
 MedDRA (✓)
 UMLS (completely incorporated)

Community contribution
Multiple HPO-specific workshops
Constant discussions via Tracker-System and E-Mail
We try our best to acknowledge contributors:
+ microattributions

Contributing to and extending HPO

HPO language translations
We need your help! http://bit.ly/hpo-translations
Translation of labels, synonyms, and text definitions
Italian Spanish Russian French
German English layperson Japanese Chinese
100%11%
12%
100%
19%19%
near 100%
20%

Adoption of HPO
Public facing databases using HPO to
annotate patients
Tools ingesting HPO-annotated data:
Köhler et al. https://doi.org/10.1093/nar/gkw1039

Why HPO is a successful standard
 One language shared by “all“
 Synonyms “map“ to one concept (HPO term)
 Contains terms that no other ontology has
 Comes with disease annotations! (Not just “Yet another clinical
terminology“)
 Simple, qualitative phenotyping, deviation (abnormal, abnormal
increase, abnormal decrease, ...) to ease analysis
 Documented, traceable editing
 Open science community project with diverse contributors
 Constantly improved and extended, examples:
 Layperson version for patients
 Language translations
 Opposite-relations between terms

Talk outline
 About HPO
 Semantic similarity
 HPO-based tools

A disease can be described
algorithmically as a collection of
phenotypes
Patient
Disease X
Differential diagnosis with matching phenotype concepts is already good
Splenomegaly
Nasal speech
Increased spleen size Nasal voice
These are synonyms in
HPO, i.e. map to the
same term
These are synonyms in
HPO, i.e. map to the
same term

A disease can be described
algorithmically as a collection of
phenotypes
Patient
Disease X
Differential diagnosis with similar but non-matching phenotypes is difficult
Splenomegaly Oral motor hypotonia
Ruptured spleen Decreased muscle mass

Similarity between two terms
Oral motor
hypotonia
Muscular
hypotonia of the
trunk
Abnormal muscle
tone
Oral motor
hypotonia
Abnormality of
calvarial
morphology
Phenotypic
abnormality
High scoring match
Very low scoring match
Medium scoring match
Score: Measured by Information Content

Comparing phenotype profiles
 E.g. Patient-to-Disease
comparison
 Patient‘s phenotypes
more similar to Disease A
 Orphamizer would rank
Disease A before Disease
Disease BPatientPatient Disease A
High scoring match
Very low scoring match
Medium scoring match
Score: Measured by Information Content

Talk outline
 About HPO
 Semantic similarity
 HPO-based visualization tools

The genome is sequenced, but...
3,398
OMIM
Mendelian Diseases with
no known genetic basis
?
At least 120,000*
ClinVar
Variants with no known
pathogenicity
…we still don’t know very much about what it does
*This is > twice what it was
in 2016!

Adding other species’ data
helps fill knowledge gaps in human genome

More species = more coverage
19,008
78%
14,779
Number of human protein-coding genes in ExAC DB as per Lek et al. Nature 2016
19,008
Even inclusion of just four species boosts
phenotypic coverage of genes by 38%
(5189%)
Combined = 89%
19,008
2,195 7,544 7,235 = 16,974
(union of coverage in any species)
9,739
51%
Mungall et al Nucleic Acids Research bit.ly/monarch-nar-2016

Ulcerated
paws
Palmoplantar
hyperkeratosis
Thick hand skin
Image credits:
"HandsEBS" by James Heilman, MD - Own work. Licensed under CC BY-SA 3.0 via Commons –
https://commons.wikimedia.org/wiki/File:HandsEBS.JPG#/media/File:HandsEBS.JPG
http://www.guinealynx.info/pododermatitis.html

Challenge: Each database uses their own
vocabulary/ontology
MP
HP
MGI
HPOA

Challenge: Each database uses their own
phenotype vocabulary/ontology
ZFA
MP
DPO
WPO
HP
OMIA
VT
FYPO
APO
SNOMED
…
NCIT
…
WB
PB
FB
OMIA
MGI
RGD
ZFIN
SGD
HPOA
EHR
IMPC
OMIM
…
QTLdb

Can we help machines understand
phenotype terms?
“Palmoplantar
hyperkeratosis”
Human phenotype
I have absolutely
no idea what
that means

Decomposition of complex concepts
using species neutral terms
Mungall, C. J., Gkoutos, G., Smith, C., Haendel, M., Lewis, S., & Ashburner,
M. (2010). Integrating phenotype ontologies across multiple species.
Genome Biology, 11(1), R2. doi:10.1186/gb-2010-11-1-r2
“Palmoplantar
hyperkeratosis”
increased
Stratum corneum
layer of skin
=
Human phenotype
PATO
Uberon
Species neutral ontologies, homologous concepts
Autopod
keratinization
GO

How can anatomy be “species-
neutral”?

HPO Interoperability and
annotations
Hyposmia
Abnormality of
globe location
eyeball of
camera-type eye
sensory
perception of smell
Abnormal eye
morphology
Motor neuron
atrophyDeeply set eyes
motor neuronCL
34571 annotations in
22 species
157534 phenotype
annotations
2150 phenotype
annotations
 11,813
phenotype
terms
 127,125 rare
disease -
phenotype
annotations
 136,268
common
disease -
phenotype
annotations
http://bit.ly/hpo-paper

Which phenotypic profile is most
similar?
Model X
Patient
Disease Y

Model X
Patient
Disease Ywww.owlsim.org
Fuzzy-phenotype matching

But what about the diseases? How to choose
which ones? What is their provenance?

A dynamic nosology
 Challenge: can we rapidly synergize
multiple knowledge sources into a
dynamic ontology?
 classic clinical phenotype-oriented disease
classification and molecular sources
 Knowledge-based approaches
 Logical Definition OWL Ontology Merging
 Bayesian OWL Ontology Merging
 Data driven
 Phenotype and functional ontology networks
Mungall, C. J.,. (2016). k-BOOM: bioRxiv, 048843. doi:10.1101/048843

DOID
(blue)
OMIM
(brown)
MESH
(grey)
ORDO/Orphanet
(yellow)
SubClassOf
(solid line)
Xref
(dashed grey line)
4 disease resources
plus mappings:
Hemolytic anemia

Coherent disease classification =>
Orphanet
https://github.com/monarch-initiative/monarch-disease-ontology
“Ontology” Classes (before, after
merge)
SubClass axioms Xrefs
Inputs:
DOID 6878  6012 7082 36656
MESH (D) 11314  4152 19036
OMIM (D) 7783  7783 0 31242
Orphanet (D) 8740  4683 15182 20326
OMIA 4833  4833 3120 355
DC 209  208 310 316
Medic 0 8630 3435
Output:
Merged 39757  27617 44837

Talk outline
 About HPO
 Leveraging basic research data
 HPO-based tools

Prevailing clinical genomic pipelines
leverage only a tiny fraction of the available
data
PATIENT EXOME
/ GENOME
PATIENT CLINICAL
PHENOTYPES
PUBLIC GENOMIC DATA
PUBLIC CLINICAL PHENOTYPE,
DISEASE DATA
POSSIBLE DISEASES
DIAGNOSIS & TREATMENT
PATIENT ENVIRONMENT
PUBLIC ENVIRONMENT,
DISEASE DATA
PATIENT OMICS PHENOTYPES PUBLIC OMICS PHENOTYPES,
CORRELATIONS
Under-utilized data

Combining G2P data for variant
prioritization
Whole exome
Remove off-target and
common variants
Variant score from allele
freq and pathogenicity
Phenotype score from phenotypic similarity
PHIVE score to give final candidates
Mendelian filters

Exomiser results for UDP diagnosed
patients
Inclusion of phenotype data improves variant prioritization
In 60% of first 1000 genomes at GEL, Exomiser
predicts top candidate
In 86% of cases, Exomiser predicts within top 5

Example case solved by Exomiser
Phenotypic
profile
Genes
Heterozygous,
missense mutation
STIM-1
N/A
Heterozygous,
missense mutation
STIM-1
N/A
Stim1Sax/Sax
Ranked STIM-1 variant maximally pathogenic
based on cross-species G2P data,
in the absence of traditional data sources
http://bit.ly/exomiser

Deep phenotyping and “fuzzy” matching
algorithms improve diagnostics
4.9% exomes with dual molecular diagnoses,
differentiated with deep phenotyping

Talk outline
 About HPO
 Exome analysis and disease discovery
 HPO-based tools

How much phenotyping is enough?
Enlarged ears (2)Dark hair (6) Female (4)
Male (4)
Blue skin (1)
Pointy ears (1)
Hair absent on head (1)
Horns present (1)
Hair present
on head (7)
Enlarged lip (2)
Increased skin
pigmentation (3)
bit.ly/annotationsufficiency

Phenotype matching visualization
widget
file:///.file/id=657136
7.18966428
bit.ly/monarch-nar-2016

Matchmaker Exchange for patients, diseases, and model
organisms to aid diagnosis and mechanistic discovery
www.monarchinitiative.org
http://bit.ly/Monarch-MME
Goal: Get clinical sites & public databases to provide standardized phenotype data

Talk outline
 About HPO
 Exome analysis and disease discovery
 HPO-based tools

Genes Environment Phenotypes+ =
Biology central dogma
Standards for exchanging data
must be up to these challenges.

Genes Environment Phenotypes+ =
Computable encodings are essential
Base pairs
Variant notation (eg. HGVS)
SNOMED-CTMedical procedure coding
Environment Ontology
@ontowonka

Genes Environment Phenotypes
VCF PXFGFF
Standard exchange mechanisms exist for
genes … but for phenotypes? Environment?
BED

Introducing PhenoPackets
A packet of phenotype data to be used
anywhere, written by anyone
http://phenopackets.org

What does a phenopacket look like?
 Alacrima
 Sleep Apnea
 Microcephaly
phenotype_profile:
- entity: ”patient16"
phenotype:
types:
- id: "HP:0000522"
label: ”Alacrima"
onset:
description: “at birth”
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: "ECO:0000033"
label: ”Traceable Author Statement"
source:
- id: ”PMID:"
 Clinical labs
 Public databases
 Journals

What about patients? Can they phenotype
themselves?

HPO for Patients
http://bit.ly/hpo-biocuration
6,200 plain language terms for patients, families, and non-experts
New software application being developed for patients

Layperson HPO + Phenopackets
 Dry eyes
 Stops breathing during sleep
 Small head
phenotype_profile:
- entity: “Grace”
phenotype:
types:
- id: "HP:0000522"
label: “Alacrima"
onset:
description: “at birth"
types:
- id: "HP:0003577"
label: "Congenital onset"
evidence:
- types:
- id: “ECO:0000033”
label: “Traceable Author Statement"
source:
- id: “
https://twitter.com/examplepatient/status/1
23456789”
• Patient registries
• Social media

Journals are now requiring HPO
terms
Robinson, P. N., Mungall, C. J., & Haendel, M. (2015). Capturing phenotypes for precision
medicine. Molecular Case Studies, 1(1), a000372. doi:10.1101/mcs.a000372
Each phenopacket
can be shared via DOI
in any repository
outside paywall (eg.
Figshare, Zenodo, etc)
Each article can be
associated with a
phenopacket

Community “curate-athons” for of HPO
Cardiovascular curate-athon at Stanford.
@20 cardiologists (surgeons, pediatric, etc.),
four ontologists, and three clinical curators
met for two days.
Abnormal Complex
Voltage to be added to all waves
-increased, decreased, fluctuating (alternans)
Duration to be added to all waves
-increased, decreased
P wave
-notching
-axis
QRS
-fractionation
-axis (right/left/extreme)
Q wave
R wave
S wave
R’ wave
S’ wave (abnormal only)
J wave (can be normal variant)
Epsilon wave (abnormal only)
Osborne wave (abnormal only)
Terminal slur wave (can be normal variant)
Delta wave (abnormal only)
Added 100s of clinically relevant
cardiophysiology phenotypes to HPO,
new exome analysis possible

Summary
 The Human Phenotype Ontology is a robust standard
describing phenotypic abnormalities FOR the community,
FROM the community for deep phenotyping rare disease
patients
 Model organism data can fill gaps in our knowledge and
aid mechanistic exploration of disease candidates
 Tools that leverage the Human Phenotype Ontology can be
used to prioritize coding and noncoding variants for WES
and WGS and CNVs
 Patients can provide self-phenotyping information as
partners in the deep phenotyping process
 Phenopackets is a FAIR-based GA4GH exchange standard
for facilitating distributed phenotype data sharing for
clinics, labs, patients, and journals

Acknowledgements
Orphanet
Ana Rath
Annie Olry
Marc Hanauer
Halima Lourghi
Lawrence Berkeley
Chris Mungall
Suzanna Lewis
Jeremy Nguyen
Seth Carbon
RENCI
Jim Balhoff
OHSU
Matt Brush
Kent Shefchek
Julie McMurry
Tom Conlin
Nicole Vasilevsky
Dan Keith
Genomics
England/Queen Mary
Damian Smedley
Jules Jacobson
Jackson Laboratory
Peter Robinson
Leigh Carmody
With special thanks to Julie McMurry for excellent graphic design
Garvan
Tudor Groza
Craig McNamara
Hipbi / NeuroCure
Dominik Seelow
Markus Schülke-
Gerstenfeld
Charite
Dominik Seelow
Tomasz Zemojtel

www.monarchinitiative.org
Funding: NIH Office of Director: 2R24OD011883; NHGRI UDP: HHSN268201300036C,
HSN268201400093P; NCATS: UDN U01TR001395,
Biomedical Data Translator: 1OT3TR002019; E-RARE 2015: Hipbi-RD 01GM1608

Global phenotypic data sharing standards to maximize diagnostic discovery

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Global phenotypic data sharing standards to maximize diagnostic discovery

Similar to Global phenotypic data sharing standards to maximize diagnostic discovery (20)

More from mhaendel

More from mhaendel (15)

Recently uploaded

Recently uploaded (20)

Global phenotypic data sharing standards to maximize diagnostic discovery

Editor's Notes