Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Phenotype-based Matching Using
PhenoDB Terms in BHCMG PhenoDB to
Maximize Whole Exome/Genome Data
Interpretation
Nara Sobr...
http://genematcher.org
GeneMatcher overview
 Intended to find other patients/animal models for a
novel candidate disease gene
 Only deidentifie...
GeneMatcher Matching options
As of May 1st 2016:
 4,459 genes
 1,675 submitters
 55 countries
 5,267 matches
on 1,216 genes
Growth in number of gen...
Matchmaker Exchange Matching
options
As of May 1st 2016:
 100 matches with
PhenomeCentral
 87 matches with
DECIPHER
Hum Mut 34:561, 2013
Hum Mut 36:425, 2015
http://phenodbresearch.net OR
http://phenodb.org
BHCMG PhenoDB numbers
 Holds data on 4,426 submissions
 Including 53 cohorts ranging from 5-295
 More than 6,225 sample...
From Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool
for annotating and ...
From Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool
for annotating and ...
Phenotype Matching Algorithms- General
Approach
 Testing set - 44 published cases with known Mendelian
phenotypes and detailed phenotypic descriptions
 Question: Can th...
 Defined test set
 Picked phenotype to be tested, remove all cases of this
phenotype from the testing set
 Picked a cas...
Percent of Cases For Which the Best Phenotypic Match
From the Database Has the Same Syndrome
SimUI Jaccard Distance Wang
R...
BHCMG PhenoDB database use
 Buske et al. Hum Mutat, 2015 Oct.
 Removed all cases with fewer than 5 phenotypic features
...
Fraction of Cases for Which the Matching Case
is in Top 5 Most Similar Cases
 “Real-World” Algorithm Testing
 n=4,114
 Wide range of depth phenotypic annotation depth
 Many cases without assigned...
How Well Does a Randomly Selected Query Case Match
to Other Cases of Same Clinical Syndrome?
Top 5 Top 25 Top 1st %ile Top...
What Factors Impact Successful Phenotypic
Matching?
Phenotypic
Features
per Case Top 5 Top 25
Top 1st
%ile
Top 5th
%ile
Go...
 As a user of a phenotype matching algorithm, how far “down
the list” would you need to go to find relevant matches?
 Re...
Threshold Testing
 Algorithms perform best for patients/syndromes with rare and
highly specific phenotypic annotations
 Depth of phenotypi...
Thanks for your attention!
Acknowledgements
 Joel Krier and François Schiettecatte for the phenotype-matching
project
 A...
Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretatio...
Upcoming SlideShare
Loading in …5
×

Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

Matchmaking initiatives like GeneMatcher, have demonstrated the utility of gene-based matching for identification of unrelated individuals with similar phenotypes and pathogenic variants in the same gene. Phenotype-based matching (PBM) has been attempted less widely because of challenges such as phenotypic variability, relative paucity of phenotypic details in clinical genomic databases, and the use of variable phenotypic terminology by clinicians and researchers. As part of the Baylor-Hopkins Center for Mendelian Genomics (BHCMG), users submit their cases to PhenoDB using PhenoDB phenotypic terms, which enables the use of semantic-similarity based methodologies to quantify phenotypic overlap within the database. To test PBM, we initially compared the following methodologies: Jacquard, Distance, Resnick (OMIM-based and PhenoDB-based corpora), and Wang. The Resnick-PhenoDB algorithm uses the phenotypic features that describe 4,114 cases in PhenoDB as the corpus for calculation of information content instead of the OMIM clinical synopses or HPO annotations. To validate the matching algorithms, we utilized a simulated set of 55 cases phenotyped by using the OMIM clinical synopsis of four well known phenotypes (OMIM 136140, 615960, 117650, 615273), and demonstrated that for 3 of the 4 disorders, all cases known to have the same disorder had the highest phenotypic similarity scores. We then tested the matching algorithms on phenotypic data from 4,114 unrelated probands in the BHCMG PhenoDB. We chose 3 phenotypes for which multiple unrelated probands are present in the database: Gomez-Lopez-Hernandez Syndrome (N=5, GLHS, OMIM 601853), Hemifacial Microsomia (N=12, HFM, OMIM 164210), and Lateral Meningocele Syndrome (N=5, LMNS, OMIM 130720). The average number of features entered per phenotype was 7.3 for GLHS, 8.14 for HFM and 0.8 for LMNS. We selected one case at random for each condition as the query case and determined the proportion of expected matching cases present in the top 1% and 5% among the 4,114 cases. Resnick-PhenoDB algorithm found that for GLHS, 3 of the 5 expected matching cases were identified among the top 1% and 4 of 5 in the top 5%. For HFM, 2 of the 12 expected matching cases were identified among the top 1% and 4 of 12 in the top 5%. For LMNS, 0 out of the 5 expected matching cases were identified among the top 1% and 0 of 5 in the top 5%. Using a simulated set of cases, we showed that all 5 algorithms performed similarly and that Resnick PhenoDB-based algorithm is able to identify and prioritize the expected matching cases among the total number of cases. Applying the Resnick PhenoDB-based algorithm to the real-world BHCMG PhenoDB showed the importance of detailed case descriptions if PBM is desired. Efforts to improve the availability and consistency of phenotypic annotations, as well as enhanced similarity calculation methodologies, will improve the fidelity and utility of PBM.

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

  • Be the first to like this

Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Sequencing Data Interpretation - Nara Sobreira

  1. 1. Phenotype-based Matching Using PhenoDB Terms in BHCMG PhenoDB to Maximize Whole Exome/Genome Data Interpretation Nara Sobreira, MD, PhD Johns Hopkins University McKusick-Nathans Institute of Genetic Medicine
  2. 2. http://genematcher.org
  3. 3. GeneMatcher overview  Intended to find other patients/animal models for a novel candidate disease gene  Only deidentified data and genes, so no IRB required  Automated matching  Submitters choose to follow up at their discretion  Now also matching on phenotypic features (since October 1st 2105)
  4. 4. GeneMatcher Matching options
  5. 5. As of May 1st 2016:  4,459 genes  1,675 submitters  55 countries  5,267 matches on 1,216 genes Growth in number of genes and matches in GeneMatcher 0 1500 3000 4500 6000 Dec. 1st, 2013Feb. 1st, 2014April 1st, 2014June 1st, 2014Aug. 1st, 2014Oct. 1st, 2014Dec. 1st, 2014Feb. 1st, 2015April 1st, 2015June 1st, 2015Aug. 1st, 2015Oct. 1st, 2015Dec. 1st, 2015Feb. 1st, 2016April 1st, 2016 Gene Count Match Count
  6. 6. Matchmaker Exchange Matching options As of May 1st 2016:  100 matches with PhenomeCentral  87 matches with DECIPHER
  7. 7. Hum Mut 34:561, 2013 Hum Mut 36:425, 2015 http://phenodbresearch.net OR http://phenodb.org
  8. 8. BHCMG PhenoDB numbers  Holds data on 4,426 submissions  Including 53 cohorts ranging from 5-295  More than 6,225 samples have been sequenced by BHCMG  Holds phenotype data from more than 10,284 individuals  BHCMG has identified more than 222 novel genes  More than 231 known genes and 136 phenotypic expansion
  9. 9. From Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet. Nov 2008;83(5):610-615.
  10. 10. From Robinson PN, Kohler S, Bauer S, Seelow D, Horn D, Mundlos S. The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet. Nov 2008;83(5):610-615.
  11. 11. Phenotype Matching Algorithms- General Approach
  12. 12.  Testing set - 44 published cases with known Mendelian phenotypes and detailed phenotypic descriptions  Question: Can the algorithms match query cases of a known syndrome to other cases with same diagnosis in the testing set? Algorithm Validation
  13. 13.  Defined test set  Picked phenotype to be tested, remove all cases of this phenotype from the testing set  Picked a case with the testing phenotype as a query case and a case to be put back into testing set  Applied the matching algorithm  Is testing case in top 1 or top 5 most similar cases?  Repeat x 1000 Pairs-Based Testing Approach
  14. 14. Percent of Cases For Which the Best Phenotypic Match From the Database Has the Same Syndrome SimUI Jaccard Distance Wang Resnick- PhenoDB Resnick- OMIM SimGIC- PhenoDB SimGIC- OMIM PhenoDigm Congenital Disorder of Deglysolyation 1 1 0.87 1 1 1 1 1 1 Floating-Harbor Syndrome 1 1 1 1 1 1 1 1 1 Poretti-Boltshauser Syndrome 1 1 1 1 1 1 1 1 1 Cerebrocosto- mandibular Syndrome 0.98 0.63 0.57 0.53 0.25 0.25 0.86 0.84 0.46
  15. 15. BHCMG PhenoDB database use  Buske et al. Hum Mutat, 2015 Oct.  Removed all cases with fewer than 5 phenotypic features  Removed all phenotypes for which only one case was present in database  N=1,152 cases across 32 phenotypes  Ran “Top 1” and “Top 5” Pairs-Based Test
  16. 16. Fraction of Cases for Which the Matching Case is in Top 5 Most Similar Cases
  17. 17.  “Real-World” Algorithm Testing  n=4,114  Wide range of depth phenotypic annotation depth  Many cases without assigned OMIM syndromes ID BHCMG PhenoDB database use
  18. 18. How Well Does a Randomly Selected Query Case Match to Other Cases of Same Clinical Syndrome? Top 5 Top 25 Top 1st %ile Top 5th %ile Gomez-Lopez- Hernandez Syndrome (N=6) 2/5 2/5 2/5 4/5 Hemifacial Microsomia (N=13) 1/12 2/12 2/12 8/12 Lateral Meningocele Syndrome (N=6) 0/5 0/5 0/5 0/5
  19. 19. What Factors Impact Successful Phenotypic Matching? Phenotypic Features per Case Top 5 Top 25 Top 1st %ile Top 5th %ile Gomez-Lopez- Hernandez Syndrome (N=6) 7 2/5 2/5 2/5 4/5 Hemifacial Microsomia (N=13) 8 1/12 2/12 2/12 8/12 Lateral Meningocele Syndrome (N=6) 1 0/5 0/5 0/5 0/5
  20. 20.  As a user of a phenotype matching algorithm, how far “down the list” would you need to go to find relevant matches?  Removed cases with fewer than 5 features Threshold Testing
  21. 21. Threshold Testing
  22. 22.  Algorithms perform best for patients/syndromes with rare and highly specific phenotypic annotations  Depth of phenotypic annotation is key  Inherent limitations to reducing a patient with a Mendelian disorder to a list of phenotypic terms  Phenotypic matching in combination with genomic data (e.g. a VCF file) may offer opportunities for gene discovery Preliminary Conclusions and Next Steps
  23. 23. Thanks for your attention! Acknowledgements  Joel Krier and François Schiettecatte for the phenotype-matching project  Ada Hamosh, François Schiettecatte, Corinne Boehm, Julie Hoover-Fong, Reid Sutton, Jim Lupski, David Valle and others for PhenoDB  Ada Hamosh and François Schiettecatte for GeneMatcher  The CMGs and especially the Baylor-Hopkins CMG team

×