Be the first to like this
Matchmaking initiatives like GeneMatcher, have demonstrated the utility of gene-based matching for identification of unrelated individuals with similar phenotypes and pathogenic variants in the same gene. Phenotype-based matching (PBM) has been attempted less widely because of challenges such as phenotypic variability, relative paucity of phenotypic details in clinical genomic databases, and the use of variable phenotypic terminology by clinicians and researchers. As part of the Baylor-Hopkins Center for Mendelian Genomics (BHCMG), users submit their cases to PhenoDB using PhenoDB phenotypic terms, which enables the use of semantic-similarity based methodologies to quantify phenotypic overlap within the database. To test PBM, we initially compared the following methodologies: Jacquard, Distance, Resnick (OMIM-based and PhenoDB-based corpora), and Wang. The Resnick-PhenoDB algorithm uses the phenotypic features that describe 4,114 cases in PhenoDB as the corpus for calculation of information content instead of the OMIM clinical synopses or HPO annotations. To validate the matching algorithms, we utilized a simulated set of 55 cases phenotyped by using the OMIM clinical synopsis of four well known phenotypes (OMIM 136140, 615960, 117650, 615273), and demonstrated that for 3 of the 4 disorders, all cases known to have the same disorder had the highest phenotypic similarity scores. We then tested the matching algorithms on phenotypic data from 4,114 unrelated probands in the BHCMG PhenoDB. We chose 3 phenotypes for which multiple unrelated probands are present in the database: Gomez-Lopez-Hernandez Syndrome (N=5, GLHS, OMIM 601853), Hemifacial Microsomia (N=12, HFM, OMIM 164210), and Lateral Meningocele Syndrome (N=5, LMNS, OMIM 130720). The average number of features entered per phenotype was 7.3 for GLHS, 8.14 for HFM and 0.8 for LMNS. We selected one case at random for each condition as the query case and determined the proportion of expected matching cases present in the top 1% and 5% among the 4,114 cases. Resnick-PhenoDB algorithm found that for GLHS, 3 of the 5 expected matching cases were identified among the top 1% and 4 of 5 in the top 5%. For HFM, 2 of the 12 expected matching cases were identified among the top 1% and 4 of 12 in the top 5%. For LMNS, 0 out of the 5 expected matching cases were identified among the top 1% and 0 of 5 in the top 5%. Using a simulated set of cases, we showed that all 5 algorithms performed similarly and that Resnick PhenoDB-based algorithm is able to identify and prioritize the expected matching cases among the total number of cases. Applying the Resnick PhenoDB-based algorithm to the real-world BHCMG PhenoDB showed the importance of detailed case descriptions if PBM is desired. Efforts to improve the availability and consistency of phenotypic annotations, as well as enhanced similarity calculation methodologies, will improve the fidelity and utility of PBM.