Your SlideShare is downloading. ×
Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Phenotype terminologies in use for genotype-phenotype databases: a common core for standardisation and interoperability - Ana Rath


Published on

The community needs to be provided with terminology standards in order to achieve interoperability between databases intended for clinical research and including description of phenotypes. This is …

The community needs to be provided with terminology standards in order to achieve interoperability between databases intended for clinical research and including description of phenotypes. This is crucial to interpret genomic rearrangements as well as future high-throughput sequence data. The aim of our work was to promote a core terminology of phenotypes interoperable with all the terminologies in use. Relevant terminologies in use by different communities to describe phenomes were cross–referenced: PhenoDB (2846 terms), London Dysmorphology Database (LDDB; 1318 terms), Orphanet (1243 terms), Human Phenotype Ontology (9895 terms, 22/08/2012), Elements of Morphology (AJMG; 423 terms), ICD10 (1230 terms), as well as medical terminologies in use: UMLS (7,957,179 distinct concept terms), SNOMED CT (>311,000 concepts), MeSH (26,853 concepts) and MedDRA (69,389 concepts). We established a strategy to compare them to find commonalities and differences, using ONAGUI as a tool to pick-up exact matches. The non-exact matches were verified manually by an expert. A core-terminology of 2,300 terms was derived and analysed by a panel of experts (International Consortium for Human Phenotype Terminologies – ICHPT). The resulting consensual terminology will be freely available in a dedicated website ( and mappings with other terminologies will be given in order to ease the interoperability between databases without disturbing the habits of the different groups of users.

Published in: Science, Education, Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Phenotype terminologies in use for genotype-phenotype databases: A common core for standardisation and interoperability HVP5 – UNESCO, Paris – 22 May 2014 Aymé S., Rath A., Chanas L., Hamosh A., Robinson P.N. & The International Consortium for Human Phenotype Terminologies
  • 2. Do you mean? Long narrow head dolichocephaly scaphocephaly
  • 3. Different resources, different terminologies (e)HR: SNOMED CT Others? Free text Mutation/patient registries, databases: HPO LDDB PhenoDB Elements of morphology Others? Free text? Tools for diagnosis: HPO LDDB Orphanet
  • 4. Levels of granularity Disorders • Purpose: coding diagnoses (i.e. medical records) Clinical manifestations • Purpose: describing patients, genotype-phenotype correlations, … (i.e. assitance-to-diganosis tools, research databases) Specialised terms • Fit the particular needs of a disease-focused database/registry (i.e. Phe values in PKU and related disorders) For phenotype annotations, interoperability between terminologies is needed at the clinical manifestations level.
  • 5. Interoperability based on mappings Syntaxic: • The terms are identical  Can be done by machines Semantic: • The concepts are identical  Should be done by humans Structural: • The comprehension of the concepts is identical  Impossible to maintain
  • 6. Phenotype terminology project Aims: • Map commonly used clinical terminologies (Orphanet, LDDB, HPO, Elements of morphology, PhenoDB, UMLS, SNOMED-CT, MESH, MedDRA):  automatic map, expert validation, detection and correction of inconsistencies • Find common terms in the terminologies • Produce a core terminology  Common denominator allowing to share/exchange phenotypic data between databases  Mapped to every single terminology
  • 7. Overview of project progress Sept 2012: start of mappings (Orphanet) EUGT2 – EUCERD workshop (Paris, September 2012) • Constitution of the International Consortium of Human Phenotype Terminologies ICHPT workshop (ASHG, Boston, October 2013) • Selection of 2,300 core terms PhenoDB HPO Orphanet LDDB Elements of Morphology POSSUM SNOMED CT (IHTSDO) DECIPHER IRDiRC
  • 8. Step 1: mapping terminologies Orphanet: 1357 terms (Orphanet database, version 2008) LDDB: 1348 dysmorphological terms (Installation CD) Elements of Morphology: 423 terms (retrieved manually from publication AJMG, January 2009) HPO: 9895 terms (download bioportal, obo format, 30/08/12) PhenoDB: 2846 terms (given in obo format, 02/05/2012) UMLS: (version 2012AA) (integrating MeSH, MedDra, SNOMED CT)
  • 9. Tools OnaGUI (INSERM U729): ontology alignment tool  Work with file in owl format  I-Sub algorithm: detect syntaxic similarity  Graphical interface to check automatic mappings and manually add ones Metamap (National Library of Medicine): a tool to map biomedical text to the UMLS Metathesaurus Perl scripts: format conversion, launching Metamap, comparison of results…
  • 10. Comparison of mappings and deduction Perl script to compare all the mappings and infer mappings of non-Orphanet terminologies Eg: Orphanet ID XX mapped to YY in HPO and ZZ in LDDB -> deduction: YY and ZZ should probably map Retrieve HPO mappings versus UMLS, MeSH First figures: LDDB El. Morpho PhenoDB HPO UMLS… Orphanet E: 1062 E: 416 E: 978 E: 2228 E: 6948 LDDB D: 275 D: 533 D: 1123 D:2678 El. Morpho D: 177 D: 716 D: 409 PhenoDB D: 1045 D:3268 HPO D: 6307+4800 UMLS…
  • 11. Mapping of non-Orphanet terminologies Automatic and infered mappings were checked by experts • Using OnaGUI for all, except UMLS  Automatic I-Sub: 7.0 + deduction • Metamap + deduction + HPO mappings Figures: El. Morpho PhenoDB HPO UMLS… LDDB D: 257 +23 added D:528, 92%E A:674, 38%E D: 1105, 87%E A: 2084, 23%E D: 2654, 83%E A: 11731 El. Morpho D:174, 50%E A:189, 74%E D:393, 93%E A: 436, 16%E D:405, 84%E A:1248 PhenoDB D:1018, 91%E A: 4168, 6%E D: 3222, 82%E A: 18776 HPO D: 7389 A: 65535 UMLS…
  • 12. First list of common terms Present in at least 3 terminologies Definition of rules for nomenclature Addition of terms present in each terminology as synonyms
  • 13. Next steps Cleaning-up: around 2,300 terms as a result Re-do mappings • In order to provide exact matchs Revision process by the group Addition of definitions • Elements of Morphology • HPO • New definitions Release in a dedicated website, hosted by • Visualisation • Download
  • 14. Talking each other Long narrow head dolichocephalyscaphocephaly
  • 15. Thank you for your attention