Segmenting & Merging Domain-specific
   Modules for Clinical Informatics

                                      Chimezie O...
Introduction
●   What are we doing and why are we doing it?
        –   Generally
        –   Specifically
●   What is the...
Introduction
●   Construct domain-specific ontologies to support
    data curation and ongoing clinical research
    activ...
Goal / Criteria for Success
●   Want to (automatically)
    –   Generate anatomy and clinical terminology modules
        ...
Desiderata for Clinical Terminology
●   There is a critical need for formal, reproducible
    methods for recognizing and ...
Desiderata (cont.)
●   Post-coordination is a contrasting approach
    where a set of atomic concepts are used to
    crea...
Background
●   Related efforts regarding
        –   Ontology merging
        –   Ontology modularization
●   Review forma...
Related Work
●   Noy and Musen (2000)
    –   Discuss how to either automate the merging and
        alignment or guide th...
Related Work (cont.)
●   Bontas et al. (2005) identify the following
    challenges in ontology re-use:
    –   Automated ...
Related Work (cont.)
●   d'Aquin et al.(2006)
    –   Use a modularization algorithm based on a
        traversal paradigm...
Modularization
●   Move to introduction (single bullet item)
●   The size of major medical ontologies is
    prohibitive t...
Deductive, Conservative Extensions
●   Grau et al. (2008) define a formal relationship
    between DL ontologies: deductiv...
Module
●   When answering a query involving terms in O
    (its signature or vocabulary), importing O'1
    should give th...
Materials
●   SNOMED-CT
●   FMA
●   Common anatomy signature
Materials
●   There is a reasonable consensus around two
    reference ontologies in clinical medicine
    –   SNOMED-CT a...
SNOMED-CT
●   A comprehensive terminological framework for
    clinical documentation and reporting.
●   Comprised of abou...
SNOMED-CT Challenges
●   Its size is deters the use of logical inference
    systems to manage and process it (due to
    ...
SNOMED-CT SEP Triplets
●   SNOMED-CT uses SEP triplets to model
    anatomy concepts and their relationships to
    each o...
SEP Triplets




Example:
Lower respiratory tract structure (part), Structure of respiratory system (structure),
Entire re...
Foundational Model of Anatomy
●   Has a goal to conceptualize the physical
    objects and spaces that constitute the huma...
FMA (cont.)
●   Concepts are connected by several
    mereological relations
●   Primarily concerned with part_of and has_...
Common Anatomy Signature
●   There is a significant overlap between anatomy
    terms in SNOMED-CT and FMA
●   Bodenreider...
Normal Forms
●   Similarly, SNOMED-CT manual describes
    methods for generating normal forms
●   Canonical forms compris...
Methods
●   Start with a list of user-specified SNOMED-CT
    concepts
        –   Determines the domain
●   3 step proces...
Core Procedure
●   Extract normal forms from SNOMED-CT
●   SNOMED-CT anatomy terms in Sanatomy that are
    reached during...
Segmentation Heuristic
●   Seidenberg and Rector (2006) describe an
    ontology segmentation heuristic that starts with
 ...
Seidenberg and Rector (2006)
Segments v.s. Modules
●   The segmentation heuristic we use is in
    contrast to those of Grau et al. (2008) that
    pro...
Reifying SEP triplets
●   Need to replace SNOMED-CT anatomy terms
    in a way that preserves the intent of the SEP
    an...
Definitions
●   Terms:
    –   Osnomed is the short normal form of SNOMED-CT
        starting from a user-specified term s...
Results
●   The applied domain
        –   Sleep studies (Polysomnograms)
●   Quantitative analysis
        –   With and w...
Analysis
●   Results:
    –   825 (718) classes in O'snct-fma
    –   901 (648) classes in O'fma-snct
    –   81 (53) SNOM...
Analysis (cont.)
●   Of the 366 (85) disorders and procedures, 23
    (4) were cross-boundary definitions
●   266 (232) FM...
SEP Reification Example
●   In SNOMED-CT, Corticobasal Degeneration is
    a disorder that has (as its finding sites):
   ...
Achieving the Goals
        Goal                      Approach
1.Identify and fill gaps in   1.Allow an informatician
  cl...
Advantages
●   We further demonstrate the general value of
    ontology segmentation within the context of
    biomedical ...
FMA Enrichment
●   Provides partitive axioms that connect the
    cerebral cortex to 100 other subordinate
    anatomical ...
Advantages (cont.)
●   O'snct-fma is a deductive, conservative extension
    of its combination with O'fma-snct
        – ...
Challenges
●   The use of disjunction operator introduces the
    need for a more expressive description logic
    than EL...
Cross-module Definitions
●   SNOMED-CT concepts in O'snct-fma defined by
    role restrictions where the filler class invo...
Conclusion (cont.)
●   However for an application that uses SNOMED-
    CT, the same disease may have 2 sites where
    on...
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Segmenting & Merging Domain-specific Modules for Clinical Informatics
Upcoming SlideShare
Loading in …5
×

Segmenting & Merging Domain-specific Modules for Clinical Informatics

2,485 views

Published on

Published in: Health & Medicine, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,485
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Segmenting & Merging Domain-specific Modules for Clinical Informatics

  1. 1. Segmenting & Merging Domain-specific Modules for Clinical Informatics Chimezie Ogbuji Cleveland Clinic & Case Western Reserve University Sivaram Arabandi Case Western Reserve University Songmao Zhang Chinese Academy of Sciences Guo-Qiang Zhang Case Western Reserve University
  2. 2. Introduction ● What are we doing and why are we doing it? – Generally – Specifically ● What is the criteria for success? ● What are existing best practices and well- documented challenges of ontology re-use?
  3. 3. Introduction ● Construct domain-specific ontologies to support data curation and ongoing clinical research activity ● PhysioMIMI is an informatics infrastructure for collection, management, and analysis of sleep- related data ● Our method was used to bootstrap a Sleep Domain Ontology (SDO)
  4. 4. Goal / Criteria for Success ● Want to (automatically) – Generate anatomy and clinical terminology modules that make use of principled normal forms, are minimal in size, and preserve the meaning of re- used symbols – As much as is computationally feasible – Be able to facilitate the customization of a large source ontology such as SNOMED-CT ● Provide a framework for bootstrapping terminology for a specific domain
  5. 5. Desiderata for Clinical Terminology ● There is a critical need for formal, reproducible methods for recognizing and filling gaps in medical terminologies (Cimino 1998) ● Clinical terminology systems need to extend smoothly and quickly in response to the needs of users (Rector 1999) – A fixed, enumerated list of concepts can never be complete and results in a combinatorial explosion of terms (exhaustive pre-coordination)
  6. 6. Desiderata (cont.) ● Post-coordination is a contrasting approach where a set of atomic concepts are used to create new terms on demand rather than a priori ● Rector 2003 proposed a set of normalization criteria and an approach for decomposing and recombining disjoint, homogenous taxonomies ● Goal is for trees of primitive terms to serve as a terminological framework that minimizes implicit differentia – Discrete coordinate system
  7. 7. Background ● Related efforts regarding – Ontology merging – Ontology modularization ● Review formalisms for ontology modularization – What is a deductive, conservative extension? – What is a module? ● What is the difference between a segment and a module?
  8. 8. Related Work ● Noy and Musen (2000) – Discuss how to either automate the merging and alignment or guide the user, suggesting conflicts and actions to take – Rely on lexical matching of term names ● Bontas and Tolksdorf (2005) – Similar goal as Noy & Musen – User provides a list of term matches between source & target – Follow semantic connections from these terms
  9. 9. Related Work (cont.) ● Bontas et al. (2005) identify the following challenges in ontology re-use: – Automated translation of source ontologies into common KR format – Customization of source ontology – Performance challenges of large medical ontologies
  10. 10. Related Work (cont.) ● d'Aquin et al.(2006) – Use a modularization algorithm based on a traversal paradigm – Describe 3 generic steps of dynamic knowledge selection algorithms: ● Selection of relevant ontologies ● Modularization via an algorithm ● Merging of ontology modules in a meaningful way – Claim all entailments are preserved but do not demonstrate how this is guaranteed
  11. 11. Modularization ● Move to introduction (single bullet item) ● The size of major medical ontologies is prohibitive to the use of deductive reasoning ● In addition and more relevant here, their size is a significant challenge to terminology management ● Ontology modularization is a blossoming field in logic engineering
  12. 12. Deductive, Conservative Extensions ● Grau et al. (2008) define a formal relationship between DL ontologies: deductive, conservative extension ● Use case: we are developing ontology P and want to re-use a set of symbols from ontology Q without changing their meaning ● If the symbols they have in common are re- used in this way then: – P + Q is a conservative extension of Q
  13. 13. Module ● When answering a query involving terms in O (its signature or vocabulary), importing O'1 should give the same answers as if O' had been imported instead: – O'1 is a more manageable fragment of O' ● Then we say O'1 is a module for O in O'
  14. 14. Materials ● SNOMED-CT ● FMA ● Common anatomy signature
  15. 15. Materials ● There is a reasonable consensus around two reference ontologies in clinical medicine – SNOMED-CT and the Foundational Model of Anatomy (FMA) ● Both leverage an underlying formal knowledge representation
  16. 16. SNOMED-CT ● A comprehensive terminological framework for clinical documentation and reporting. ● Comprised of about half a million concepts: – Clinical findings, procedures, body structures, organisms, substances, pharmaceutical products, specimen, quantitative measures, and clinical situations ● Has an underlying description logic (EL) – EL has been proven to be suitable for medical terminology
  17. 17. SNOMED-CT Challenges ● Its size is deters the use of logical inference systems to manage and process it (due to performance issues) ● Most description logic systems run into challenges with memory exhaustion when classifying it in its entirety ● In some cases, its definitions are inconsistent or incomplete ● However, it is the de facto reference for clinical terminology
  18. 18. SNOMED-CT SEP Triplets ● SNOMED-CT uses SEP triplets to model anatomy concepts and their relationships to each other ● For every proper SNOMED-CT anatomy concept (an Entire class), there are two auxiliary classes: – A Structure class – A Part class ● Main motivation is to rely on subsumption to reason about part-whole relationships
  19. 19. SEP Triplets Example: Lower respiratory tract structure (part), Structure of respiratory system (structure), Entire respiratory system (entire)
  20. 20. Foundational Model of Anatomy ● Has a goal to conceptualize the physical objects and spaces that constitute the human body ● Leverages a frame-based knowledge representation to formulate over 75,000 concepts including: – Macroscopic, microscopic, and sub-cellular canonical anatomy ● Anatomy is fundamental to biomedical domains
  21. 21. FMA (cont.) ● Concepts are connected by several mereological relations ● Primarily concerned with part_of and has_part ● Adheres to a strict, aristotelian modeling paradigm – Ensures definitions are consistent and state the essence of anatomy in terms of their characteristics ● Using a 2006 OWL translation from the version in the OBO foundary
  22. 22. Common Anatomy Signature ● There is a significant overlap between anatomy terms in SNOMED-CT and FMA ● Bodenreider and Zhang (2006) analyzed this overlap ● Leveraged lexical and structural analysis ● Identified ~ 7500 common concepts – Refer to as Sanatomy ● Key to the general applicability of our method within the domain of clinical medicine
  23. 23. Normal Forms ● Similarly, SNOMED-CT manual describes methods for generating normal forms ● Canonical forms comprised of maximally decomposed logical expressions – Entailments from full SNOMED-CT still follow from normal forms ● Useful for comparing post-coordinated expressions during retrieval or analysis of data
  24. 24. Methods ● Start with a list of user-specified SNOMED-CT concepts – Determines the domain ● 3 step process resulting in – A SNOMED-CT module: O'snct-fma – Transliteration of SEP triplets – A FMA segment: O'fma-snct ● Segmentation heuristic ● Directly merge into a single ontology
  25. 25. Core Procedure ● Extract normal forms from SNOMED-CT ● SNOMED-CT anatomy terms in Sanatomy that are reached during the extraction are replaced and used as seeds to extract a segment from the FMA ● Axioms involving SNOMED-CT anatomy terms in Sanatomy and the terms themselves are replaced such that they preserve the intent of the SEP triplet scheme using FMA terms
  26. 26. Segmentation Heuristic ● Seidenberg and Rector (2006) describe an ontology segmentation heuristic that starts with a set of terms and creates an extract from an ontology around those terms – Traverses ontology structure and is limited by user- specified recursion depth ● Inspiration for modularization algorithm of d'Aquin et al. (2006)
  27. 27. Seidenberg and Rector (2006)
  28. 28. Segments v.s. Modules ● The segmentation heuristic we use is in contrast to those of Grau et al. (2008) that produce modules with 100% semantic fidelity ● Sacrifice semantic fidelity for an expedient extraction process ● The (tractable) calculation of deductive, conservative extensions for EL is an open research problem ● Or at the very least a challenging problem
  29. 29. Reifying SEP triplets ● Need to replace SNOMED-CT anatomy terms in a way that preserves the intent of the SEP anatomy scheme ● Transcribe them into a more expressive description logic ● Define a set of rules to determine how axioms involving mapped SNOMED-CT terms are replaced ● Shultz et al. (1998) describe how to logically identify components of an SEP triplet
  30. 30. Definitions ● Terms: – Osnomed is the short normal form of SNOMED-CT starting from a user-specified term set ● Anatomy module for a clinical domain – O'snct-fma is a module for Osnomed in Ofma with respect to Sanatomy ● Clinical domain module for anatomy – O'fma-snct is a module for Ofma in Osnomed with respect to Sanatomy
  31. 31. Results ● The applied domain – Sleep studies (Polysomnograms) ● Quantitative analysis – With and without the use of normal forms ● Example ● How the goals were met ● Advantages ● Challenges
  32. 32. Analysis ● Results: – 825 (718) classes in O'snct-fma – 901 (648) classes in O'fma-snct – 81 (53) SNOMED-CT anatomy concepts in Sanatomy were reached – 43 (35) were structures, 37 (17) were entire parts, one was a part *Numbers in parenthesis are within the normal form
  33. 33. Analysis (cont.) ● Of the 366 (85) disorders and procedures, 23 (4) were cross-boundary definitions ● 266 (232) FMA classes were at the periphery of the segment extraction heuristic ● Candidates for subsequent FMA extraction – Incrementally expand the domain by connections to related parts of human anatomy
  34. 34. SEP Reification Example ● In SNOMED-CT, Corticobasal Degeneration is a disorder that has (as its finding sites): – Cerebral cortex (structure) – Basal ganglion (structure) ● As a result of the SEP reification, it is defined as follows
  35. 35. Achieving the Goals Goal Approach 1.Identify and fill gaps in 1.Allow an informatician clinical terminology to seed and control 2.Use canonical, the extraction normalized 2.Take advantage of representations normal form 3.Has sufficient transformations expressive power 3.Leveraging more 4.Re-uses the FMA expressive KR 4.Use a set of rules to reify SEP triplets
  36. 36. Advantages ● We further demonstrate the general value of ontology segmentation within the context of biomedical terminology ● Address the challenge of managing terminology and filling in gaps using reference ontologies in a coordinated way ● The use of a more expressive DL to reify SEP triplets is similar to the approach of Suntisrivaraporn (2007) – We use terms from a reference ontology of anatomy
  37. 37. FMA Enrichment ● Provides partitive axioms that connect the cerebral cortex to 100 other subordinate anatomical entities
  38. 38. Advantages (cont.) ● O'snct-fma is a deductive, conservative extension of its combination with O'fma-snct – Every inclusion axiom involving FMA terms alone in the combination also holds in FMA as a whole – The reification process takes advantage of the fidelity of the SNOMED-CT to FMA mappings ● Any application that uses the FMA can still use the combination without loss of meaning of the FMA terms
  39. 39. Challenges ● The use of disjunction operator introduces the need for a more expressive description logic than EL++ ● Subsumption links are only traversed upwards from target terms – Found that downward traversal significantly impacts the size of the segment
  40. 40. Cross-module Definitions ● SNOMED-CT concepts in O'snct-fma defined by role restrictions where the filler class involve anatomy terms in Sanatomy ● These embody the kinds of explicit definitions that normal forms attempt to facilitate ● In some cases, the definitions are enriched due to connections to FMA – Resulting in richer entailment
  41. 41. Conclusion (cont.) ● However for an application that uses SNOMED- CT, the same disease may have 2 sites where one is a SNOMED-CT concept and the other is an FMA concept.

×