This document describes a study that identified questionable relationship triples in the Unified Medical Language System (UMLS) Metathesaurus. The researchers developed an algorithm to automatically identify suspicious relationship triples that fall into four cases: 1) conflicting hierarchical relationships, 2) redundant hierarchical relationships, 3) mixed hierarchical/lateral relationships, and 4) multiple mutually exclusive lateral relationships. Statistics on the identified questionable relationships in the UMLS 2010AA release are presented, along with a discussion of relationship issues that may arise from the integration of source vocabularies.
Part of the "2016 Annual Conference: Big Data, Health Law, and Bioethics" held at Harvard Law School on May 6, 2016.
This conference aimed to: (1) identify the various ways in which law and ethics intersect with the use of big data in health care and health research, particularly in the United States; (2) understand the way U.S. law (and potentially other legal systems) currently promotes or stands as an obstacle to these potential uses; (3) determine what might be learned from the legal and ethical treatment of uses of big data in other sectors and countries; and (4) examine potential solutions (industry best practices, common law, legislative, executive, domestic and international) for better use of big data in health care and health research in the U.S.
The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics at Harvard Law School 2016 annual conference was organized in collaboration with the Berkman Center for Internet & Society at Harvard University and the Health Ethics and Policy Lab, University of Zurich.
Learn more at http://petrieflom.law.harvard.edu/events/details/2016-annual-conference.
Measuring Semantic Similarity and Relatedness in the Biomedical Domain : Methods and Applications - presented Feb 21, 2012 as a webinar to the Mayo Clinic BMI group.
Ontology and the National Cancer Institute Thesaurus (2005)Barry Smith
The National Cancer Institute Thesaurus is described by its authors as "a biomedical vocabulary that provides consistent, unambiguous codes and definitions for concepts used in cancer research" and which "exhibits ontology-like properties in its construction and use". We performed a qualitative analysis of the Thesaurus in order to assess its conformity with principles of good practice in terminology and ontology design.
MATERIALS AND METHODS:
We used both the on-line browsable version of the Thesaurus and its OWL-representation (version 04.08b, released on August 2, 2004), measuring each in light of the requirements put forward in relevant ISO terminology standards and in light of ontological principles advanced in the recent literature.
RESULTS:
We found many mistakes and inconsistencies with respect to the term-formation principles used, the underlying knowledge representation system, and missing or inappropriately assigned verbal and formal definitions.
CONCLUSION:
Version 04.08b of the NCI Thesaurus suffers from the same broad range of problems that have been observed in other biomedical terminologies. For its further development, we recommend the use of a more principled approach that allows the Thesaurus to be tested not just for internal consistency but also for its degree of correspondence to that part of reality which it is designed to represent.
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...ijdms
Entity-Relationship (ER) modeling plays a major role in relational database design. The data requirements
are conceptualized using an ER model and then transformed to relations. If the requirements are well
understood by the designer and then if the ER model is modeled and transformed to relations, the resultant
relations will be normalized. However the choice of modeling relationships between entities with
appropriate degree and cardinality ratio has a very severe impact on database design. In this paper, we
focus on the issues related to modeling binary relationships, ternary relationships, decomposing ternary
relationships to binary equivalents and transforming the same to relations. The impact of applying higher
normal forms to relations with composite keys is analyzed. We have also proposed a methodology which
database designers must follow during each phase of database design.
A review on Analyzing Multiple Medical Corpora Using Word EmbeddingReza Sadeghi
The goals of this paper are as follows:
evaluating the effectiveness of word2vec in representingboth medical terms and medical relations, and investigate how different genres of medical text corpora affect the word embedding models learned for medical concepts.
Semantic Similarity Measures between Terms in the Biomedical Domain within f...Editor IJCATR
The techniques and tests are tools used to define how measure the goodness of ontology or its resources. The similarity between biomedical classes/concepts is an important task for the biomedical information extraction and knowledge discovery. However, most of the semantic similarity techniques can be adopted to be used in the biomedical domain (UMLS). Many experiments have been conducted to check the applicability of these measures. In this paper, we investigate to measure semantic similarity between two terms within single ontology or multiple ontologies in ICD-10 “V1.0” as primary source, and compare my results to human experts score by correlation coefficient.
Comparative Analysis of Association Rule Mining, Crowdsourcing, and NDF-RT Kn...Allison McCoy
Automatic summarization of electronic health records (EHRs) can help compile and organize the growing amount of patient information confronting healthcare providers. Here, we evaluate three different approaches to problem-medication pair generation, an important automatic summarization task, and find that association rule mining and crowdsourcing provide similar problem-medication relations while the National Drug File-Reference Terminology (NDF-RT) provides new relations not encountered in the other two.
Statistics is a powerful tool for both researchers and decision makers, yet, there remains many misuse, misinterpretations, and misrepresentations of statistics. This seminar aims at raising awareness of common misconceptions in statistics in social science and beyond (e.g. media, readers). I do not own the copyrights of the materials in this presentation, all the sources were added in the bottom of the slide in which I borrowed the figures from other sources.
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
A look at how the thinking about Web Data and the sources of semantics can help drive decisions on combining latent and explicit knowledge. Examples from Elsevier and lots of pointers to related work.
Part of the "2016 Annual Conference: Big Data, Health Law, and Bioethics" held at Harvard Law School on May 6, 2016.
This conference aimed to: (1) identify the various ways in which law and ethics intersect with the use of big data in health care and health research, particularly in the United States; (2) understand the way U.S. law (and potentially other legal systems) currently promotes or stands as an obstacle to these potential uses; (3) determine what might be learned from the legal and ethical treatment of uses of big data in other sectors and countries; and (4) examine potential solutions (industry best practices, common law, legislative, executive, domestic and international) for better use of big data in health care and health research in the U.S.
The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics at Harvard Law School 2016 annual conference was organized in collaboration with the Berkman Center for Internet & Society at Harvard University and the Health Ethics and Policy Lab, University of Zurich.
Learn more at http://petrieflom.law.harvard.edu/events/details/2016-annual-conference.
Measuring Semantic Similarity and Relatedness in the Biomedical Domain : Methods and Applications - presented Feb 21, 2012 as a webinar to the Mayo Clinic BMI group.
Ontology and the National Cancer Institute Thesaurus (2005)Barry Smith
The National Cancer Institute Thesaurus is described by its authors as "a biomedical vocabulary that provides consistent, unambiguous codes and definitions for concepts used in cancer research" and which "exhibits ontology-like properties in its construction and use". We performed a qualitative analysis of the Thesaurus in order to assess its conformity with principles of good practice in terminology and ontology design.
MATERIALS AND METHODS:
We used both the on-line browsable version of the Thesaurus and its OWL-representation (version 04.08b, released on August 2, 2004), measuring each in light of the requirements put forward in relevant ISO terminology standards and in light of ontological principles advanced in the recent literature.
RESULTS:
We found many mistakes and inconsistencies with respect to the term-formation principles used, the underlying knowledge representation system, and missing or inappropriately assigned verbal and formal definitions.
CONCLUSION:
Version 04.08b of the NCI Thesaurus suffers from the same broad range of problems that have been observed in other biomedical terminologies. For its further development, we recommend the use of a more principled approach that allows the Thesaurus to be tested not just for internal consistency but also for its degree of correspondence to that part of reality which it is designed to represent.
DESIGN METHODOLOGY FOR RELATIONAL DATABASES: ISSUES RELATED TO TERNARY RELATI...ijdms
Entity-Relationship (ER) modeling plays a major role in relational database design. The data requirements
are conceptualized using an ER model and then transformed to relations. If the requirements are well
understood by the designer and then if the ER model is modeled and transformed to relations, the resultant
relations will be normalized. However the choice of modeling relationships between entities with
appropriate degree and cardinality ratio has a very severe impact on database design. In this paper, we
focus on the issues related to modeling binary relationships, ternary relationships, decomposing ternary
relationships to binary equivalents and transforming the same to relations. The impact of applying higher
normal forms to relations with composite keys is analyzed. We have also proposed a methodology which
database designers must follow during each phase of database design.
A review on Analyzing Multiple Medical Corpora Using Word EmbeddingReza Sadeghi
The goals of this paper are as follows:
evaluating the effectiveness of word2vec in representingboth medical terms and medical relations, and investigate how different genres of medical text corpora affect the word embedding models learned for medical concepts.
Semantic Similarity Measures between Terms in the Biomedical Domain within f...Editor IJCATR
The techniques and tests are tools used to define how measure the goodness of ontology or its resources. The similarity between biomedical classes/concepts is an important task for the biomedical information extraction and knowledge discovery. However, most of the semantic similarity techniques can be adopted to be used in the biomedical domain (UMLS). Many experiments have been conducted to check the applicability of these measures. In this paper, we investigate to measure semantic similarity between two terms within single ontology or multiple ontologies in ICD-10 “V1.0” as primary source, and compare my results to human experts score by correlation coefficient.
Comparative Analysis of Association Rule Mining, Crowdsourcing, and NDF-RT Kn...Allison McCoy
Automatic summarization of electronic health records (EHRs) can help compile and organize the growing amount of patient information confronting healthcare providers. Here, we evaluate three different approaches to problem-medication pair generation, an important automatic summarization task, and find that association rule mining and crowdsourcing provide similar problem-medication relations while the National Drug File-Reference Terminology (NDF-RT) provides new relations not encountered in the other two.
Statistics is a powerful tool for both researchers and decision makers, yet, there remains many misuse, misinterpretations, and misrepresentations of statistics. This seminar aims at raising awareness of common misconceptions in statistics in social science and beyond (e.g. media, readers). I do not own the copyrights of the materials in this presentation, all the sources were added in the bottom of the slide in which I borrowed the figures from other sources.
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsPaul Groth
A look at how the thinking about Web Data and the sources of semantics can help drive decisions on combining latent and explicit knowledge. Examples from Elsevier and lots of pointers to related work.
The Monarch Initiative: From Model Organism to Precision Medicinemhaendel
NIH BD2K all-hands meeting poster November 12, 2015.
Attempts at correlating phenotypic aspects of disease with causal genetic influences are often confounded by the challenges of interpreting diverse data distributed across numerous resources. New approaches to data modeling, integration, tooling, and community practices are needed to make efficient use of these data. The Monarch Initiative is an international consortium working on the development of shared data, tools, and standards to enable direct translation of integrated genotype, phenotype, and environmental data from human and model organisms to enhance our understanding of human disease. We utilize sophisticated semantic mapping techniques across a diverse set of standardized ontologies to deeply integrate data across species, sources, and modalities. Using phenotype similarity matching algorithms across these data enables disorder prediction, variant prioritization, and patient matching against known diseases and model organisms. These similarity algorithms form the core of several innovative tools. The Exomiser, which enables exome variant prioritization by combining pathogenicity, frequency, inheritance, protein interaction, and cross-species phenotype data. Our Phenotype Sufficiency tool provides clinicians the ability to compare patient phenotypic profiles using the Human Phenotype Ontology to determine uniqueness and specificity in support of variant prioritization. The PhenoGrid visualization widget illustrates phenotype similarity between patients, known diseases, and model organisms. Monarch develops models in collaboration with the community in support of the burgeoning genotype-phenotype disease research community. We have successfully used Exomiser to solve a number of undiagnosed patient cases in collaboration with the NIH Undiagnosed Disease Program. Ongoing development in coordination with the Global Alliance for Genetic Health (GA4GH) and other groups will catalyze the realization of our goal of a vital translational community focused on the collaborative application of integrated genotype, phenotype, and environmental data to human disease.
1. 1
Questionable Relationship
Triples in the UMLS
Huanying Gu, Gai Elhanan, Michael Halper, Zhe He
Structural Analysis of Biomedical Ontologies Center (SABOC),
New Jersey Institute of Technology, Newark, NJ
2. 2
Outline
„ Background
{ Biomedical terminologies and the UMLS
{ Relationships in the UMLS
„ Identify suspicious relationship triples
„ Results
„ Discussion
3. 3
Biomedical Terminologies
„ What is biomedical terminology?
{ A collection of concepts with attributes and
relationships
{ Used to encode drugs, diseases, diagnoses,
findings, etc.
„ The importance of biomedical terminologies
{ Clinical practice, e.g. ICD10 use in diagnosis coding
and billing
{ Biomedical research
{ Healthcare applications: EHR and EMR
4. 4
The Unified Medical Language System
„ A system that integrates more than 150
terminologies to enable interoperability between
computerized systems in healthcare industry
„ Designed and developed by United States
National Library of Medicine
5. 5
Structure of the UMLS
„ Two-level structure of the UMLS
{ Metathesaurus (META):
„ More than 2 million concepts
„ Terms from sources are grouped into concepts
„ More than 50 million relationships
{ Semantic Network
„ 133 semantic types
„ Each concept in META is assigned to at least one
semantic type
6. 6
Relationship in the META
„ UMLS META relationship:
{ Derived from the source
{ Introduced during the integration
„ 11 different relationship types (RELs):
{ Hierarchical relationships
„ Parent (PAR) and child (CHD)
„ Broader (RB) and narrower (RN)
{ Lateral (non-hierarchical) relationships
„ e.g. SY (synonym)
7. 7
Problems of Relationship Triples
„ Relationship in META is the foundation of concept
definitions
„ Problems of relationship triples:
{ There exist multiple relationships between the
same pair of concepts, they may be
„ from the same source
„ from different sources
{ Multiple RELs between a pair of concepts may
indicate problems
8. 8
Suspicious Relationship Triples
„ Relationship triple
{ Source concept – A
{ Target concept – B
{ Relationship – r
{ (r, A, B)
B
r1 r2
A
9. 9
Identify Suspicious Relationship Triples
„ Our algorithm automatically identifies all
suspicious relationship triples.
„ They are in the following 4 cases
{ Conflicting hierarchical RELs
{ Redundant hierarchical RELs
{ Mixed hierarchical/lateral RELs
{ Multiple mutually exclusive lateral RELs
10. 10
Case 1: Conflicting Hierarchical RELs
„ Two or more hierarchical RELs existing
between two different concepts forming a
hierarchical cycle where one or more RELs are
incorrect.
„ PAR and CHD
„ PAR and RN
„ RB and CHD
„ RB and RN
B
A
PAR
(RB)
CHD
(RN)
11. 11
Example of Case 1
Dermatologic
disorder
PAR
UMLS_CODE
C0037274
Inverse_isa
Dermatitis UMLS_CODE
C0011603
Dermatologic
disorder
UMLS_CODE
C0037274
Dermatitis UMLS_CODE
C0011603
conflicting hierarchical relationships
CHD
isa
CHD
isa
12. 12
Case 2: Redundant Hierarchical RELs
„ A concept is PAR (CHD) and RB (RN) of
another given concept at the same time
{PAR and RB
{CHD and RN
B
A
CHD
(RB)
RN
(PAR)
13. Case 3: Mixed Hierarchical/lateral RELs
13
„ Two different concepts A and B with one
hierarchical relationship and one lateral
relationship which are mutually exclusive and
cannot occur in the same pair of concepts at the
same time.
B
A
PAR
(CHD, RB,
RN)
Lateral
REL
14. 14
Examples of Case 2 & 3
Right suprascapular vein
Structure of
suprascapular vein
Right external
jugular vein
PAR
inverse_isa
RB (Broader)
inverse_isa
PAR
has_tributray
RO (other semantic relation)
has_tributary
RB and PAR: redundant
hierarchical relationships
PAR and RO:
mutually exclusive relationships
15. 15
Case 4: Multiple Lateral RELs
„ Two different concepts A and B with two lateral
relationships which are mutually exclusive.
Mutual exclusivity can only be asserted by the
relationship attributes qualifying both RELs
B
A
Lateral
REL
Lateral
REL
16. 16
Example for Case 4
SLE glomerulonephritis
syndrome, WHO class V
Lupus Erythematosus,
Systemic
RO
associated_with
RL
mapped_from
SLE glomerulonephritis
syndrome, WHO class V
RO
associated_with
Lupus Erythematosus,
Systemic
UMLS_CODE
C0268758
UMLS_CODE
C0024141
18. 18
Discussion
„ Certain of REL triples can be attributed to the
process of the source vocabulary integration.
„ Questionable relationship triples may be an
indicator of term ambiguity.
„ Algorithmic approaches that can easily detect and
classify such errors are important.
19. „ This work was partially supported by the National
Library of Medicine, NIH R01 grant
REFERENCES
[1] O. Bodenreider, "The Unified Medical Language System (UMLS): integrating biomedical
terminology." Nucleic Acids Res. 2004 Jan 1;32(Database issue):D267-70
[2] O. Bodenreider, "Circular hierarchical relationships in the UMLS: Etiology, diagnosis,
treatment, complications and prevention." AMIA Annu Symp Proc; 2001:57–61.
[3] UMLS Reference Manual: www.nlm.nih.gov/research/umls/meta2.html.
[4] H. Gu, Y. Perl, G. Elhanan, H. Min, L.Zhang, and Y. Peng, "Auditing concept categorizations
in the UMLS." Artif Intell Med. 2004 May;31(1):29–44.
[5] Y. Chen, H. Gu, Y. Perl, and J. Geller, "Structural group-based auditing of missing
hierarchical relationships in UMLS." J Biomed Inform. 2009 Jun;42(3):452–67.
[6] O. Bodenreider, S.J. Nelson, W.T. Hole, and H.F. Chang, "Beyond synonymy: exploiting the
UMLS semantics in mapping vocabularies." Proc AMIA Symp. 1998:815–9.
[7] F. Mougin and O. Bodenreider, "Approaches to eliminating cycles in the UMLS
Metathesaurus: Naïve vs. formal." AMIA Annu Symp Proc; 2005:550–4.
[8] M. Halper, C.P. Morrey, Y. Chen, G. Elhanan, G. Hripcsak, and Y. Perl, "Auditing
Hierarchical Cycles to Locate Other Inconsistencies in the UMLS." AMIA Annu Symp
Proc; 2011:529–33.
19
Acknowledgement and References