Semantic Web Technologies: A Paradigm for Medical Informatics


Published on

Some common needs for the patient registries, Electronic Health Record (EHR) systems, and clinical research repositories of the future are: semantic interoperability, adoption of standardized clinical terminology, adhoc and distributed querying interfaces, and integration with extant databases and web-based systems. A suite of standards has recently emerged from the consortium responsible for the development and oversight of the protocols of the World-wide Web (WWW). They were conceived to address data integration challenges associated with internet and intranet applications. Many of these standards and technologies are capable of addressing the challenges common to health information systems. In this talk, an introductory overview of these technologies, how they address these challenges, and a brief discussion of projects where they have been used is given.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Semantic Web Technologies: A Paradigm for Medical Informatics

  1. 1. Semantic Web Technologies: A Paradigm for Medical Informatics Chimezie Ogbuji (Owner, Metacognition LLC.)
  2. 2. Who I am  Circa 2001: Introduced to web standards and Semantic Web technologies  2003-2011: Lead architect of CCF in-house clinical repository project  2006-2011: Member representative of CCF in World-wide Web Consortium (W3C) ◦ Editor of various standards and Semantic Web Health Care and Life Sciences Interest Group chair  2011-2012: Senior Research Associate at CWRU Center for Clinical Investigations  2012-current: Started business providing resource and data management software for home healthcare agencies (Metacognition LLC)
  3. 3. Medical Informatics Challenges  Semantic interoperability ◦ Exchange of data with common meaning between sender and receiver  Most of the intended benefits of HIT depend on interoperability between systems  Difficulties integrating patient record systems with other information resources are among the major issues hampering their effectiveness ◦ Interoperability is a major goal for meaningful use of Electronic Health Records (EHR) Rodrigues et al. 2013; Kadry et al. 2010; Shortliffe and Cimino, 2006
  4. 4. Requirements and Solutions  Semantic interoperability requires: ◦ Structured data ◦ A common controlled vocabulary  Solutions emphasize the meaning of data rather than how they are structured ◦ “Semantic” paradigms
  5. 5. Registries and Research DBs  Patient registries and clinical research repositories capture data elements in a uniform manner  The structure of the underlying data needs to be able to evolve along with the investigations they support  Thus, schema extensibility is important
  6. 6. Querying Interfaces  Standardized interfaces for querying facilitate: ◦ Accessibility to clinical information systems ◦ Distributed querying of data from where they reside  Requires: ◦ Semantically-equivalent data structures  Alternatively, data are centralized in data warehouses Austin et al. 2007, “Implementation of a query interface for a generic record server”
  7. 7. Biomedical Ontologies  Ontologies are artifacts that conceptualize a domain as a taxonomy of classes and constraints on relationships between their members  Represented in a particular formalism  Increasingly adopted as a foundation for the next generation of biomedical vocabularies  Construction involves representing a domain of interest independent of behavior of applications using an ontology  Important means towards achieving semantic interoperability
  8. 8. Biomedical Ontology Communities  Prominent examples of adoption by life science and healthcare terminology communities: ◦ The Open Biological and Biomedical Ontologies (OBO) Foundry ◦ Gene Ontology (GO) ◦ National Center for Biomedical Ontology (NCBO) Bioportal ◦ International Health Terminology Standards Development Organization (IHTSDO)
  9. 9. Semantic Web and Technologies  The Semantic Web is a vision of how the existing infrastructure of the World-wide Web (WWW) can be extended such that machines can interpret the meaning of data on it  Semantic Web technologies are the standards and technologies that have been developed to achieve the vision
  10. 10. An Analogy  (Technological) singularity is a theoretical moment when artificial intelligence (AI) will have progressed to a greater-than-human intelligence  Despite remaining in the realm of science fiction, it has motivated many useful developments along the way ◦ The use of ontologies for knowledge representation and IBM Watson capabilities, for example
  11. 11. Background: Graphs  Graphs are data structures comprising nodes and edges that connect them  The edges can be directional  Either the nodes, the edges, or both can be labeled  The labels provide meaning to the graphs (edge labels in particular) Node Nodeedge
  12. 12. Resource Description Framework  The Resource Description Framework (RDF) is a graph-based knowledge representation language for describing resources  It’s edges are directional and both nodes and edges are labeled  It uses Universal Resource Identifiers (URI) for labeling  Foundation for Semantic Web technologies
  13. 13. RDF: Continued  The edges are statements (triples) that go from a subject to an object  Some objects are text values  Some subjects and objects can be left unlabeled (Blank nodes) ◦ Anonymous resources: not important to label them uniquely  The URI of the edge is the predicate  Predicates used together for a common purpose are a vocabulary
  14. 14.  Subject: Dr. X (a URI)  Object: Chime  Predicate: treats  Vocabulary: ◦ treats, subject of record, author, and full name Chime Dr. X treats subject of record author "Chimezie Ogbuji"full name
  15. 15. RDF vocabularies  How meaning is interpreted from an RDF graph  There are vocabularies that constrain how predicates are used ◦ Want a sense of treats where the subject is a clinician and the object is a patient  There is a predicate relating resources to the classes they are a member of (type)  There are vocabularies that define constraints on class hierarchies  These comprise a basic RDF Schema (RDFS) language  Represented as an RDF graph
  16. 16. Chime Dr. X treats subject of record Patient Physician type type Hypertension DX Clinical Diagnosis type is a authorPerson is a is a
  17. 17. Ontologies for RDF  The Ontology Web Language (OWL) is used to describe ontologies for RDF graphs  More sophisticated constraints than RDFS  Commonly expressed as an RDF graph  Defines the meaning of RDF statements through constraints: ◦ On their predicates ◦ On the classes the resources they relate
  18. 18. Chime Dr. X treats subject of record Patient Physician type type Hypertension DX Clinical Diagnosis type is a authorPerson is a is a Governed by OWL/ RDFS for domain
  19. 19. OWL Formats  Most common format for describing ontologies  Distribution format of ontologies in the NCBO BioPortal  SNOMED CT distributions include an OWL representation ◦ RDF graphs can describe medical content in a SNOMED CT-compliant way through the use of this vocabulary
  20. 20. Validation and Deduction  OWL is based on a formal, mathematical logic that can be used for validating the structure of an ontology and RDF data that conform to it (consistency checking)  Used to deduce additional RDF statements implied by the meaning of a given RDF graph (logical inference)  Logical reasoners are used for this
  21. 21. Inference  Can infer anatomical location from SNOMED CT definitions Hypertension DX type finding site Systemic circulatory system structure type Hypertension DX <-> 1201005 / “Benign essential hypertension (disorder)
  22. 22. Querying RDF Graphs  SPARQL is the official query language for RDF graphs  Comparable to relational query languages ◦ Primary difference: it queries RDF triples, whereas SQL queries tables of arbitrary dimensions  Includes various web protocols for querying RDF graphs  Foundation of SPARQL is the triple pattern  (?clinician, treats, ?patient) ◦ ?clinician and ?patient are variables (like a wildcard)
  23. 23. ?patient ?physician ?dx treats subject of record author Hypertension DX type Which physicians have given essential hypertension diagnoses and to w (?physician, author, ?dx) (?physician, treats, ?patient) (?dx, subject of record, ?patient) (?dx, type, Hypertension DX) ?physician ?patient ?dx Dr. X Chime …
  24. 24. SPARQL over Relational Data  Most common implementations convert SPARQL to SQL and evaluate over: ◦ a relational databases designed for RDF storage ◦ an existing relational database  There are products for both approaches  Former requires native storage of RDF ◦ Relational structure doesn’t change even as RDF vocabulary does (schemaElliot et al. 2009, “A Complete Translation from SPARQL into Efficient SQL”
  25. 25. SPARQL over Existing Relation Data  “Virtual RDF view” ◦ Translation to SQL follows a given mapping from existing relational structures to an RDF vocabulary ◦ Allows non-disruptive evolution of existing systems ◦ Well-suited as a standard querying interface over clinical data repositories ◦ They can be queried as SPARQL, securely over encrypted HTTP
  26. 26. Relational RDF (SNOMED CT perhaps) Mapping and Translation layer Secure HTTP SPARQL SQL Legacy / existing applications Patient registry or data repository 3rd party applications SQL
  27. 27. Example: Cleveland Clinic (SemanticDB)  Content repository and data production system released in Jan. 2008  80 million (native) RDF statements ◦ Uses vocabulary from a patient record OWL ontology for the registry  Based on ◦ Existing registry of heart surgery and CV interventions ◦ 200,000 patient records ◦ Generating over 100 publications per year Pierce et al. 2012, “SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting
  28. 28. Cohort Identification  Interface developed in conjunction with Cycorp  Leverage their logical reasoning system (Cyc) ◦ Identifies cohorts using natural language (NL) sentence fragments ◦ Converts fragments to SPARQL ◦ SPARQL is evaluated against RDF store
  29. 29. Example: Mayo Clinic (MCLSS)  Mayo Clinic Life Sciences System (MCLSS) ◦ Effort to represent Mayo Clinic EHR data as RDF graphs ◦ Patient demographics, diagnoses, procedures, lab results, and free-text notes ◦ Goal was to wrap MCLSS relational database and expose as read-only, query- able RDF graphs that conform to standard ontologiesPathak et al. 2012, "Using Semantic Web Technologies for Cohort Identification from Electronic Health Records for Clinical Research"
  30. 30. Example: Mayo Clinic (CEM)  Clinical Element Model (CEM) ◦ Represents logical structure of data in EHR ◦ Goal: translate CEM definitions into OWL and patient (instance) data into conformant RDF ◦ Use tools (logical reasoners) to check semantic consistency of the ontology, instance data, and to extract new knowledge via deduction ◦ Instance data validation:  correct number of linked components, value within data range, existence of units, etc. Tao et al. 2012, ”A semantic-web oriented representation of the clinical element model for secondary use of electronic health records data"
  31. 31. Summary  Schema extensibility ◦ Use of RDF  Semantic Interoperability ◦ Domain modeling using OWL and RDFS  Standardized query interfaces ◦ Querying over SPARQL  Incremental, non-disruptive adoption ◦ Virtual RDF views  Main challenge: highly disruptive innovation