Automated clinicalontologyextraction


Published on

Presentation of methods for segmenting, merging, and surveying domain-specific clinical ontologies in an automated fashion.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Automated clinicalontologyextraction

  1. 1. Automated Extraction of Domain-specific Clinical Ontologies<br />Segmenting, merging, and surveying modules<br />Chimezie<br />
  2. 2. Need for Ontology Bootstrapping<br />There is a critical need for formal, reproducible methods for recognizing and filling gaps in medical terminologies (Cimino 1998)<br />Clinical terminology systems need to extend smoothly and quickly in response to the needs of users (Rector 1999)<br />A fixed, enumerated list of concepts can never be complete and results in a combinatorial explosion of terms (exhaustive pre-coordination)<br />
  3. 3. A general best practice is to re-use ontologies, especially those that have been standardized<br />However, there is a proliferation of (domain-specific) clinical ontologies<br />Flies in the face of this best practice<br />As more projects leverage the full value of reference, medical ontologies, there will be an increased need for automated management:<br />Not there yet, mostly have coding systems<br />
  4. 4. The Goal<br />Want to (automatically)<br />Customize a large source ontology such as SNOMED-CT in a tractable way<br />Generate normalized, anatomy and clinical terminology modules that are manageable in size, and preserve the meaning of common terms<br />Provide a framework for bootstrapping the creation of clinical terminology for a specific domain<br />
  5. 5. Prior Work<br />Noy and Musen (2000)<br />Discuss how to either automate the merging and alignment or guide the user, suggesting conflicts and actions to take<br />Rely on lexical matching of term names<br />Bontas and Tolksdorf (2005)<br />Similar goal as Noy & Musen<br />User provides a list of term matches between source & target<br />Follow semantic connections from these terms<br />
  6. 6. Modularization:Ontology Engineering<br />Seidenberg and Rector (2006) describe an ontology segmentation heuristic that starts with a set of terms and creates an extract from an ontology around those terms<br />Traverses ontology structure and is limited by user-specified recursion depth<br />
  7. 7. Seidenberg and Rector (2006)<br />
  8. 8. Grau et al. (2008): Developing ontology P and want to re-use a set of symbols from (another) ontology Q without changing their meaning<br />P + Q is a conservative extension of Q<br />When answering a query involving terms in O (its signature or vocabulary), importing O'1 should give the same answers as if O' had been imported instead (both are subsets but O'1 is more manageable):<br />Then we say O'1 is a module for O in O'<br />
  9. 9. Segments v.s. Modules <br />The segmentation heuristic used is in contrast to (and predates) those of Grau et al. (2008) that produce modules with 100% semantic fidelity<br />Sacrifice semantic fidelity for an expedient extraction process<br />The (tractable) calculation of deductive, conservative extensions for EL is an open research problem<br />
  10. 10. Materials<br />SNOMED-CT<br />Foundational Model of Anatomy (FMA)<br />Common anatomy signature<br />
  11. 11. Reference Clinical Ontologies<br />There is a reasonable consensus around two reference ontologies that cover a substantial portion of clinical medicine<br />SNOMED-CT and the FMA<br />Both leverage an underlying formal knowledge representation <br />
  12. 12. SNOMED-CT<br />A comprehensive terminological framework for clinical documentation and reporting.<br />Comprised of about half a million concepts:<br />Clinical findings, procedures, body structures, organisms, substances, pharmaceutical products, specimen, quantitative measures, and clinical situations<br />Has an underlying description logic (EL family)<br />EL family has shown to be suitable for medical terminology<br />And subsequently, ELHR+, the performance target of many modern classifiers<br />
  13. 13. Technical challenges:<br />Its size discourages the use of logical inference systems to manage and process it (due to performance issues)<br />Most description logic systems run into challenges with memory exhaustion when classifying it in its entirety (there have been recent advances here)<br />In some cases, its definitions are inconsistent or incomplete (more on this later)<br />Policy pressures (opportunity):<br />Participants in meaningful use program must capture EHR problem lists based on ICD-9 or SNOMED-CT<br />
  14. 14. Using Modulzarization for Quality Assurance<br />Plenty of (recent) work on quality assurance of SNOMED-CT<br />Using Semantic Web technologies (and lattice theory) for quality assurance of large biomedical ontologies (Zhang et al. 2010)<br />Identifying incorrect or clinically misleading SNOMED-CT inferences that arose from use of SNOMED-CT(Rector et al. 2011)<br />More, recent QA of SNOMED-CT (Rector 2011) leverages extraction of manageable modules and discusses the value to domain experts of browsing SNOMED-CT via a module built from a set of terms relevant to a domain or application <br />
  15. 15. Foundational Model of Anatomy<br />Goal is to conceptualize the physical objects and spaces that constitute the human body<br />Leverages a frame-based knowledge representation to formulate over 75,000 concepts including:<br />Macroscopic, microscopic, and sub-cellular canonical anatomy<br />Anatomy is fundamental to biomedical domains<br />
  16. 16. Concepts are connected by several mereological relations<br />Primarily concerned with part_of and has_part<br />Adheres to a strict, aristotelian modeling paradigm<br />Ensures definitions are consistent and state the essence of anatomy in terms of their characteristics<br />Using July 24th 2008 ALPHA version of the FMA 2.0 in OBO foundry<br />
  17. 17. Common Anatomy Signature<br />There is a significant overlap between anatomy terms in SNOMED-CT and FMA<br />Bodenreider and Zhang (2006) analyzed this overlap<br />Leveraged lexical and structural analysis<br />Identified ~ 7500 common concepts<br />Refer to as Sanatomy<br />
  18. 18. Small Detail: SEP Triplets<br />SNOMED-CT uses SEP triplets to model anatomy concepts and their relationships to each other<br />For every proper SNOMED-CT anatomy concept (an Entire class), there are two auxiliary classes:<br />A Structure class<br />A Part class<br />
  19. 19. Example<br />
  20. 20. Main motivation is to rely on subsumption to reason about part-whole relationships<br />SNOMED-CT is moving away from this, but for the purpose of using it in concert with the FMA, this is still an issue<br />Previous work (Suntisrivaraporn 2007) demonstrated how an expressive description logic can be used to more directly represent mereological relations.<br />
  21. 21. Build on this but re-use terms (a transliteration) from a reference ontology of anatomy rather than re-using SNOMED-CT terms<br />To preserve the meaning of anatomy terms but increase the (latent) knowledge about them and provide a terminology path to additional terms of interest<br />
  22. 22. Reifying SEP triplets<br />Need to replace SNOMED-CT anatomy terms in a way that preserves the intent of the SEP anatomy scheme<br />Transcribe them into a more expressive description logic<br />Define a set of rules to determine how axioms involving mapped SNOMED-CT terms are replaced<br />(Shultz et al. 1998) describe how to logically identify components of an SEP triplet<br />
  23. 23.
  24. 24. Method<br />Start with a list of user-specified SNOMED-CT concepts <br />Determines the domain<br />3 step process resulting in<br />A SNOMED-CT module: O'snct-fma<br />Transliteration of SEP triplets<br />FMA segment: O'fma-snct<br />Directly merge results into a single ontology<br />
  25. 25. Segmenting and Merging Domain-specific Ontology Modules for Clinical Informatics (Ogbuji 2010)<br />
  26. 26.
  27. 27. Collecting the domain of discourse<br />(Sahoo et al. 2011) Automatically extract a minimal common set of terms (upper-domain ontology) from an existing domain ontology<br />Can be used to survey the generation of anatomy and clinical terminology modules:<br />“For a given domain, what are the most general categories of (clinical) terminology that can be automatically extracted from specific distributions of SNOMED-CT and the FMA?” <br />
  28. 28. Demonstration<br />Implementation (Python)<br /><br />Example: Atrial Fibrillation (disorder)<br />