Semantic Web Technologies: A
Paradigm for Medical
Informatics
Chimezie Ogbuji (Owner, Metacognition LLC.)
http://metacognition.info/presentations/SWTMedicalInformatics.pdf
http://metacognition.info/presentations/SWTMedicalInformatics.ppt
Who I am
 Circa 2001: Introduced to web standards and
Semantic Web technologies
 2003-2011: Lead architect of CCF in-house
clinical repository project
 2006-2011: Member representative of CCF in
World-wide Web Consortium (W3C)
◦ Editor of various standards and Semantic Web
Health Care and Life Sciences Interest Group
chair
 2011-2012: Senior Research Associate at
CWRU Center for Clinical Investigations
 2012-current: Started business providing
resource and data management software for
home healthcare agencies (Metacognition
LLC)
Medical Informatics
Challenges
 Semantic interoperability
◦ Exchange of data with common meaning
between sender and receiver
 Most of the intended benefits of HIT
depend on interoperability between
systems
 Difficulties integrating patient record
systems with other information resources
are among the major issues hampering
their effectiveness
◦ Interoperability is a major goal for meaningful
use of Electronic Health Records (EHR)
Rodrigues et al. 2013; Kadry et al. 2010; Shortliffe and Cimino, 2006
Requirements and Solutions
 Semantic interoperability requires:
◦ Structured data
◦ A common controlled vocabulary
 Solutions emphasize the meaning of
data rather than how they are
structured
◦ “Semantic” paradigms
Registries and Research DBs
 Patient registries and clinical research
repositories capture data elements in
a uniform manner
 The structure of the underlying data
needs to be able to evolve along with
the investigations they support
 Thus, schema extensibility is
important
Querying Interfaces
 Standardized interfaces for querying
facilitate:
◦ Accessibility to clinical information
systems
◦ Distributed querying of data from where
they reside
 Requires:
◦ Semantically-equivalent data structures
 Alternatively, data are centralized in
data warehouses
Austin et al. 2007, “Implementation of a query interface for a generic record server”
Biomedical Ontologies
 Ontologies are artifacts that
conceptualize a domain as a taxonomy
of classes and constraints on
relationships between their members
 Represented in a particular formalism
 Increasingly adopted as a foundation for
the next generation of biomedical
vocabularies
 Construction involves representing a
domain of interest independent of
behavior of applications using an
ontology
 Important means towards achieving
semantic interoperability
Biomedical Ontology
Communities
 Prominent examples of adoption by
life science and healthcare
terminology communities:
◦ The Open Biological and Biomedical
Ontologies (OBO) Foundry
◦ Gene Ontology (GO)
◦ National Center for Biomedical Ontology
(NCBO) Bioportal
◦ International Health Terminology
Standards Development Organization
(IHTSDO)
Semantic Web and
Technologies
 The Semantic Web is a vision of how
the existing infrastructure of the
World-wide Web (WWW) can be
extended such that machines can
interpret the meaning of data on it
 Semantic Web technologies are the
standards and technologies that have
been developed to achieve the vision
An Analogy
 (Technological) singularity is a
theoretical moment when artificial
intelligence (AI) will have progressed
to a greater-than-human intelligence
 Despite remaining in the realm of
science fiction, it has motivated many
useful developments along the way
◦ The use of ontologies for knowledge
representation and IBM Watson
capabilities, for example
Background: Graphs
 Graphs are data structures
comprising nodes and edges that
connect them
 The edges can be directional
 Either the nodes, the edges, or both
can be labeled
 The labels provide meaning to the
graphs (edge labels in particular)
Node Nodeedge
Resource Description
Framework
 The Resource Description Framework
(RDF) is a graph-based knowledge
representation language for describing
resources
 It’s edges are directional and both
nodes and edges are labeled
 It uses Universal Resource Identifiers
(URI) for labeling
 Foundation for Semantic Web
technologies
RDF: Continued
 The edges are statements (triples) that
go from a subject to an object
 Some objects are text values
 Some subjects and objects can be left
unlabeled (Blank nodes)
◦ Anonymous resources: not important to label
them uniquely
 The URI of the edge is the predicate
 Predicates used together for a common
purpose are a vocabulary
 Subject: Dr. X (a URI)
 Object: Chime
 Predicate: treats
 Vocabulary:
◦ treats, subject of record, author, and full
name
Chime
Dr. X
treats subject of record
author
"Chimezie Ogbuji"full name
RDF vocabularies
 How meaning is interpreted from an RDF
graph
 There are vocabularies that constrain how
predicates are used
◦ Want a sense of treats where the subject is a
clinician and the object is a patient
 There is a predicate relating resources to the
classes they are a member of (type)
 There are vocabularies that define
constraints on class hierarchies
 These comprise a basic RDF Schema
(RDFS) language
 Represented as an RDF graph
Chime
Dr. X
treats subject of record
Patient
Physician
type
type
Hypertension DX
Clinical Diagnosis
type
is a
authorPerson
is a
is a
Ontologies for RDF
 The Ontology Web Language (OWL)
is used to describe ontologies for RDF
graphs
 More sophisticated constraints than
RDFS
 Commonly expressed as an RDF
graph
 Defines the meaning of RDF
statements through constraints:
◦ On their predicates
◦ On the classes the resources they relate
Chime
Dr. X
treats subject of record
Patient
Physician
type
type
Hypertension DX
Clinical Diagnosis
type
is a
authorPerson
is a
is a
Governed by OWL/ RDFS for domain
OWL Formats
 Most common format for describing
ontologies
 Distribution format of ontologies in the
NCBO BioPortal
 SNOMED CT distributions include an
OWL representation
◦ RDF graphs can describe medical content
in a SNOMED CT-compliant way through
the use of this vocabulary
Validation and Deduction
 OWL is based on a formal,
mathematical logic that can be used
for validating the structure of an
ontology and RDF data that conform
to it (consistency checking)
 Used to deduce additional RDF
statements implied by the meaning of
a given RDF graph (logical inference)
 Logical reasoners are used for this
Inference
 Can infer anatomical location from
SNOMED CT definitions
Hypertension DX
type
finding site
Systemic circulatory
system structure
type
Hypertension DX <-> 1201005 / “Benign essential hypertension (disorder)
Querying RDF Graphs
 SPARQL is the official query language
for RDF graphs
 Comparable to relational query
languages
◦ Primary difference: it queries RDF triples,
whereas SQL queries tables of arbitrary
dimensions
 Includes various web protocols for
querying RDF graphs
 Foundation of SPARQL is the triple
pattern
 (?clinician, treats, ?patient)
◦ ?clinician and ?patient are variables (like a
wildcard)
?patient
?physician ?dx
treats subject of record
author
Hypertension DX
type
Which physicians have given essential hypertension diagnoses and to w
(?physician, author, ?dx)
(?physician, treats, ?patient)
(?dx, subject of record, ?patient)
(?dx, type, Hypertension DX)
?physician ?patient ?dx
Dr. X Chime …
SPARQL over Relational Data
 Most common implementations
convert SPARQL to SQL and evaluate
over:
◦ a relational databases designed for RDF
storage
◦ an existing relational database
 There are products for both
approaches
 Former requires native storage of RDF
◦ Relational structure doesn’t change even
as RDF vocabulary does (schemaElliot et al. 2009, “A Complete Translation from SPARQL into Efficient SQL”
SPARQL over Existing Relation
Data
 “Virtual RDF view”
◦ Translation to SQL follows a given
mapping from existing relational
structures to an RDF vocabulary
◦ Allows non-disruptive evolution of existing
systems
◦ Well-suited as a standard querying
interface over clinical data repositories
◦ They can be queried as SPARQL,
securely over encrypted HTTP
Relational RDF (SNOMED CT perhaps)
Mapping and
Translation layer
Secure HTTP
SPARQL
SQL
Legacy / existing
applications
Patient registry or
data repository
3rd party applications
SQL
Example: Cleveland Clinic
(SemanticDB)
 Content repository and data
production system released in Jan.
2008
 80 million (native) RDF statements
◦ Uses vocabulary from a patient record
OWL ontology for the registry
 Based on
◦ Existing registry of heart surgery and CV
interventions
◦ 200,000 patient records
◦ Generating over 100 publications per year
Pierce et al. 2012, “SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting
Cohort Identification
 Interface developed in conjunction
with Cycorp
 Leverage their logical reasoning
system (Cyc)
◦ Identifies cohorts using natural language
(NL) sentence fragments
◦ Converts fragments to SPARQL
◦ SPARQL is evaluated against RDF store
Example: Mayo Clinic
(MCLSS)
 Mayo Clinic Life Sciences System
(MCLSS)
◦ Effort to represent Mayo Clinic EHR data
as RDF graphs
◦ Patient demographics, diagnoses,
procedures, lab results, and free-text
notes
◦ Goal was to wrap MCLSS relational
database and expose as read-only, query-
able RDF graphs that conform to standard
ontologiesPathak et al. 2012, "Using Semantic Web Technologies for Cohort Identification from Electronic
Health Records for Clinical Research"
Example: Mayo Clinic (CEM)
 Clinical Element Model (CEM)
◦ Represents logical structure of data in
EHR
◦ Goal: translate CEM definitions into OWL
and patient (instance) data into
conformant RDF
◦ Use tools (logical reasoners) to check
semantic consistency of the ontology,
instance data, and to extract new
knowledge via deduction
◦ Instance data validation:
 correct number of linked components, value
within data range, existence of units, etc.
Tao et al. 2012, ”A semantic-web oriented representation of the clinical element model for
secondary use of electronic health records data"
Summary
 Schema extensibility
◦ Use of RDF
 Semantic Interoperability
◦ Domain modeling using OWL and RDFS
 Standardized query interfaces
◦ Querying over SPARQL
 Incremental, non-disruptive adoption
◦ Virtual RDF views
 Main challenge: highly disruptive
innovation

Semantic Web Technologies: A Paradigm for Medical Informatics

  • 1.
    Semantic Web Technologies:A Paradigm for Medical Informatics Chimezie Ogbuji (Owner, Metacognition LLC.) http://metacognition.info/presentations/SWTMedicalInformatics.pdf http://metacognition.info/presentations/SWTMedicalInformatics.ppt
  • 2.
    Who I am Circa 2001: Introduced to web standards and Semantic Web technologies  2003-2011: Lead architect of CCF in-house clinical repository project  2006-2011: Member representative of CCF in World-wide Web Consortium (W3C) ◦ Editor of various standards and Semantic Web Health Care and Life Sciences Interest Group chair  2011-2012: Senior Research Associate at CWRU Center for Clinical Investigations  2012-current: Started business providing resource and data management software for home healthcare agencies (Metacognition LLC)
  • 3.
    Medical Informatics Challenges  Semanticinteroperability ◦ Exchange of data with common meaning between sender and receiver  Most of the intended benefits of HIT depend on interoperability between systems  Difficulties integrating patient record systems with other information resources are among the major issues hampering their effectiveness ◦ Interoperability is a major goal for meaningful use of Electronic Health Records (EHR) Rodrigues et al. 2013; Kadry et al. 2010; Shortliffe and Cimino, 2006
  • 4.
    Requirements and Solutions Semantic interoperability requires: ◦ Structured data ◦ A common controlled vocabulary  Solutions emphasize the meaning of data rather than how they are structured ◦ “Semantic” paradigms
  • 5.
    Registries and ResearchDBs  Patient registries and clinical research repositories capture data elements in a uniform manner  The structure of the underlying data needs to be able to evolve along with the investigations they support  Thus, schema extensibility is important
  • 6.
    Querying Interfaces  Standardizedinterfaces for querying facilitate: ◦ Accessibility to clinical information systems ◦ Distributed querying of data from where they reside  Requires: ◦ Semantically-equivalent data structures  Alternatively, data are centralized in data warehouses Austin et al. 2007, “Implementation of a query interface for a generic record server”
  • 7.
    Biomedical Ontologies  Ontologiesare artifacts that conceptualize a domain as a taxonomy of classes and constraints on relationships between their members  Represented in a particular formalism  Increasingly adopted as a foundation for the next generation of biomedical vocabularies  Construction involves representing a domain of interest independent of behavior of applications using an ontology  Important means towards achieving semantic interoperability
  • 8.
    Biomedical Ontology Communities  Prominentexamples of adoption by life science and healthcare terminology communities: ◦ The Open Biological and Biomedical Ontologies (OBO) Foundry ◦ Gene Ontology (GO) ◦ National Center for Biomedical Ontology (NCBO) Bioportal ◦ International Health Terminology Standards Development Organization (IHTSDO)
  • 9.
    Semantic Web and Technologies The Semantic Web is a vision of how the existing infrastructure of the World-wide Web (WWW) can be extended such that machines can interpret the meaning of data on it  Semantic Web technologies are the standards and technologies that have been developed to achieve the vision
  • 10.
    An Analogy  (Technological)singularity is a theoretical moment when artificial intelligence (AI) will have progressed to a greater-than-human intelligence  Despite remaining in the realm of science fiction, it has motivated many useful developments along the way ◦ The use of ontologies for knowledge representation and IBM Watson capabilities, for example
  • 11.
    Background: Graphs  Graphsare data structures comprising nodes and edges that connect them  The edges can be directional  Either the nodes, the edges, or both can be labeled  The labels provide meaning to the graphs (edge labels in particular) Node Nodeedge
  • 12.
    Resource Description Framework  TheResource Description Framework (RDF) is a graph-based knowledge representation language for describing resources  It’s edges are directional and both nodes and edges are labeled  It uses Universal Resource Identifiers (URI) for labeling  Foundation for Semantic Web technologies
  • 13.
    RDF: Continued  Theedges are statements (triples) that go from a subject to an object  Some objects are text values  Some subjects and objects can be left unlabeled (Blank nodes) ◦ Anonymous resources: not important to label them uniquely  The URI of the edge is the predicate  Predicates used together for a common purpose are a vocabulary
  • 14.
     Subject: Dr.X (a URI)  Object: Chime  Predicate: treats  Vocabulary: ◦ treats, subject of record, author, and full name Chime Dr. X treats subject of record author "Chimezie Ogbuji"full name
  • 15.
    RDF vocabularies  Howmeaning is interpreted from an RDF graph  There are vocabularies that constrain how predicates are used ◦ Want a sense of treats where the subject is a clinician and the object is a patient  There is a predicate relating resources to the classes they are a member of (type)  There are vocabularies that define constraints on class hierarchies  These comprise a basic RDF Schema (RDFS) language  Represented as an RDF graph
  • 16.
    Chime Dr. X treats subjectof record Patient Physician type type Hypertension DX Clinical Diagnosis type is a authorPerson is a is a
  • 17.
    Ontologies for RDF The Ontology Web Language (OWL) is used to describe ontologies for RDF graphs  More sophisticated constraints than RDFS  Commonly expressed as an RDF graph  Defines the meaning of RDF statements through constraints: ◦ On their predicates ◦ On the classes the resources they relate
  • 18.
    Chime Dr. X treats subjectof record Patient Physician type type Hypertension DX Clinical Diagnosis type is a authorPerson is a is a Governed by OWL/ RDFS for domain
  • 19.
    OWL Formats  Mostcommon format for describing ontologies  Distribution format of ontologies in the NCBO BioPortal  SNOMED CT distributions include an OWL representation ◦ RDF graphs can describe medical content in a SNOMED CT-compliant way through the use of this vocabulary
  • 20.
    Validation and Deduction OWL is based on a formal, mathematical logic that can be used for validating the structure of an ontology and RDF data that conform to it (consistency checking)  Used to deduce additional RDF statements implied by the meaning of a given RDF graph (logical inference)  Logical reasoners are used for this
  • 21.
    Inference  Can inferanatomical location from SNOMED CT definitions Hypertension DX type finding site Systemic circulatory system structure type Hypertension DX <-> 1201005 / “Benign essential hypertension (disorder)
  • 22.
    Querying RDF Graphs SPARQL is the official query language for RDF graphs  Comparable to relational query languages ◦ Primary difference: it queries RDF triples, whereas SQL queries tables of arbitrary dimensions  Includes various web protocols for querying RDF graphs  Foundation of SPARQL is the triple pattern  (?clinician, treats, ?patient) ◦ ?clinician and ?patient are variables (like a wildcard)
  • 23.
    ?patient ?physician ?dx treats subjectof record author Hypertension DX type Which physicians have given essential hypertension diagnoses and to w (?physician, author, ?dx) (?physician, treats, ?patient) (?dx, subject of record, ?patient) (?dx, type, Hypertension DX) ?physician ?patient ?dx Dr. X Chime …
  • 24.
    SPARQL over RelationalData  Most common implementations convert SPARQL to SQL and evaluate over: ◦ a relational databases designed for RDF storage ◦ an existing relational database  There are products for both approaches  Former requires native storage of RDF ◦ Relational structure doesn’t change even as RDF vocabulary does (schemaElliot et al. 2009, “A Complete Translation from SPARQL into Efficient SQL”
  • 25.
    SPARQL over ExistingRelation Data  “Virtual RDF view” ◦ Translation to SQL follows a given mapping from existing relational structures to an RDF vocabulary ◦ Allows non-disruptive evolution of existing systems ◦ Well-suited as a standard querying interface over clinical data repositories ◦ They can be queried as SPARQL, securely over encrypted HTTP
  • 26.
    Relational RDF (SNOMEDCT perhaps) Mapping and Translation layer Secure HTTP SPARQL SQL Legacy / existing applications Patient registry or data repository 3rd party applications SQL
  • 27.
    Example: Cleveland Clinic (SemanticDB) Content repository and data production system released in Jan. 2008  80 million (native) RDF statements ◦ Uses vocabulary from a patient record OWL ontology for the registry  Based on ◦ Existing registry of heart surgery and CV interventions ◦ 200,000 patient records ◦ Generating over 100 publications per year Pierce et al. 2012, “SemanticDB: A Semantic Web Infrastructure for Clinical Research and Quality Reporting
  • 28.
    Cohort Identification  Interfacedeveloped in conjunction with Cycorp  Leverage their logical reasoning system (Cyc) ◦ Identifies cohorts using natural language (NL) sentence fragments ◦ Converts fragments to SPARQL ◦ SPARQL is evaluated against RDF store
  • 29.
    Example: Mayo Clinic (MCLSS) Mayo Clinic Life Sciences System (MCLSS) ◦ Effort to represent Mayo Clinic EHR data as RDF graphs ◦ Patient demographics, diagnoses, procedures, lab results, and free-text notes ◦ Goal was to wrap MCLSS relational database and expose as read-only, query- able RDF graphs that conform to standard ontologiesPathak et al. 2012, "Using Semantic Web Technologies for Cohort Identification from Electronic Health Records for Clinical Research"
  • 30.
    Example: Mayo Clinic(CEM)  Clinical Element Model (CEM) ◦ Represents logical structure of data in EHR ◦ Goal: translate CEM definitions into OWL and patient (instance) data into conformant RDF ◦ Use tools (logical reasoners) to check semantic consistency of the ontology, instance data, and to extract new knowledge via deduction ◦ Instance data validation:  correct number of linked components, value within data range, existence of units, etc. Tao et al. 2012, ”A semantic-web oriented representation of the clinical element model for secondary use of electronic health records data"
  • 31.
    Summary  Schema extensibility ◦Use of RDF  Semantic Interoperability ◦ Domain modeling using OWL and RDFS  Standardized query interfaces ◦ Querying over SPARQL  Incremental, non-disruptive adoption ◦ Virtual RDF views  Main challenge: highly disruptive innovation