Semantic Web Technologies in Health Care Analytics

SEMANTIC WEB TECHNOLOGIES IN HEALTH CARE ANALYTICS
AN IMPACT SCENARIO FOR DATALOG REASONING WITH RDFOX
Robert Piro
Departmental Seminar
Robert Piro Semantic Web Technologies in Health Care 1/15

OVERVIEW
1 RDFOX
RDF
Datalog
2 PROJECT WITH KAISER PERMANENTE
HEDIS Measures for Diabetic Care
Data Model
Data Model as RDF Triples
The Datalog Rules
3 CONCLUSION & FUTURE WORK

RDFox
RDFOX — RESULT OF 4 YEARS OF DEVELOPMENT
RDFOX (BORIS MOTIK, YAVOR NENOV, ROBERT PIRO, IAN HORROCKS)
in memory RDF Triple Store — optimised indexing
parallel Datalog Reasoner — very good scalability

RDFox
FEATURES
load RDF data (Triples/Turtle)
materialise data — (extended) Datalog language
incremental reasoning / equality reasoning
query data — SPARQL query Language

RDFox
FEATURES
load RDF data (Triples/Turtle)
materialise data — (extended) Datalog language
incremental reasoning / equality reasoning
query data — SPARQL query Language
INTEGRATION
stand-alone C++ implementation / C++ library
Java/Python Bridge
SPARQL end-point

RDFox RDF
RDF — RESOURCE DESCRIPTION FRAMEWORK
RDF
data format with types W3C standard encode semantic data
Triple: subject predicate object (s, p, o)
building blocks: resources & literals
URI — <http://www.w3.org/2001/XMLSchema#double>
String, Boolean, Integer, Decimal — "0.789"ˆˆxsd:double

RDFox RDF
RDF — RESOURCE DESCRIPTION FRAMEWORK
RDF
data format with types W3C standard encode semantic data
Triple: subject predicate object (s, p, o)
building blocks: resources & literals
URI — <http://www.w3.org/2001/XMLSchema#double>
String, Boolean, Integer, Decimal — "0.789"ˆˆxsd:double
EXAMPLE (ENCODING A DATABASE TABLE IN RDF)
Table: PATIENT VISIT
REC | MBR | SERV DT | CPT | ... | DIAG1 | ... | DIAG22
001 | 007 | 20151101 | ...
@prefix ex: <http://my.example.com/FieldName/> .
@prefix visit: <http://my.example.com/Rec/PATIENT VISIT/> .
visit:001 ex:MBR "007" .
visit:001 ex:SERV DT "2015-11-01"ˆˆxsd:date .

RDFox Datalog
DATALOG
RDF DATALOG RULE
[s0, p0, o0] ← [s1, p1, o1], . . . , [sn, pn, on]. ‘IF...AND...THEN...’
Variables start with ‘?’. Var(head) ⊆ Var(body)
EXAMPLE (MATERIALISATION WITH RDFOX)
[?p, ex:has, ex:Diabetes] ← [?p, ex:MBRNo, ?mbr], [?rec, ex:MBR, ?mbr],
[?rec, ex:DIAG, "Diabetes"].

RDFox Datalog
DATALOG
RDF DATALOG RULE
Data
p:007 ex:MBRNo "007" . v:001 ex:DIAG "Diabetes" .
v:001 ex:MBR "007" . p:001 ex:MBR "001" .

RDFox Datalog
DATALOG
RDF DATALOG RULE
[p:007, ex:has, ex:Diabetes] ← [p:007, ex:MBRNo, "007"], [?rec, ex:MBR, "007"],
Data
v:001 ex:MBR "007" . p:001 ex:MBR "001" .

RDFox Datalog
DATALOG
RDF DATALOG RULE
[p:007, ex:has, ex:Diabetes] ← [p:007, ex:MBRNo, "007"], [v:001, ex:MBR, "001"]
[v:001, ex:DIAG, "Diabetes"].
Data
v:001 ex:MBR "007" . p:001 ex:MBR "001" .

RDFox Datalog
DATALOG
RDF DATALOG RULE
[p:007, ex:has, ex:Diabetes] ← [p:007, ex:MBRNo, "007"], [v:001, ex:MBR, "007"],
Data
v:001 ex:MBR "007" . p:001 ex:MBR "001" .

RDFox Datalog
DATALOG
RDF DATALOG RULE
[p:007, ex:has, ex:Diabetes] ← [p:007, ex:MBRNo, "007"], [v:001, ex:MBR, "007"],
Data
v:001 ex:MBR "007" . p:001 ex:MBR "001" .
p:007 ex:has ex:Diabetes .

RDFox Datalog
DATALOG
RDF DATALOG RULE
Data
p:007 ex:MBR "007" . v:001 ex:DIAG "Diabetes" .
v:001 ex:MBR "007" . p:001 ex:MBR "001" .

RDFox Datalog
DATALOG
RDF DATALOG RULE
Data
v:001 ex:MBR "007" . p:001 ex:MBR "001" .
RDFOX COMPUTES all CONSEQUENCES . . .
also from newly derived data
in a systematic way

RDFox Datalog
DATALOG
RDF DATALOG RULE
Data
v:001 ex:MBR "007" . p:001 ex:MBR "001" .
RDFOX COMPUTES all CONSEQUENCES . . . AND TERMINATES
also from newly derived data
in a systematic way

RDFox Datalog
RDFOX AND DATALOG
STATS
Name Start (Trp) End (Trp) Mem Cores Time
DBpedia 112M 118M 6.1GB 8 28s
Claros 19M 96 M 4.2GB 16(32) 127s
LUBM-1K 134M 182M 9.3GB 16 8s
LUBM-9K 6G 9G ≈100GB 128(1024) 8s

RDFox Datalog
RDFOX AND DATALOG
STATS
Claros 19M 96 M 4.2GB 16(32) 127s
LUBM-1K 134M 182M 9.3GB 16 8s
LUBM-9K 6G 9G ≈100GB 128(1024) 8s
FEATURES OF RDFOX DATALOG
Allows many more constructs (arithmetic*, string ops*, comparisons)
Will allow negation, aggregation (can be simulated already)
Generalises OWL 2 RL; Reasoning with OWL 2 EL reduceable to Datalog

RDFox Datalog
RDFOX AND DATALOG
STATS
Claros 19M 96 M 4.2GB 16(32) 127s
LUBM-1K 134M 182M 9.3GB 16 8s
LUBM-9K 6G 9G ≈100GB 128(1024) 8s
FEATURES OF RDFOX DATALOG
Allows many more constructs (arithmetic*, string ops*, comparisons)
Will allow negation, aggregation (can be simulated already)
Generalises OWL 2 RL; Reasoning with OWL 2 EL reduceable to Datalog
GENERAL FEATURES OF DATALOG
Intuitive if-then-statements
Declarative (say what, not how to compute)
Powerful due to recursion

Project with Kaiser Permanente
KAISER PERMANENTE
THE ORGANISATION
Kaiser HealthPlan, Kaiser Hospitals, Permanente Medical Group
KP largest ‘managed care’ organisation in the U.S.
KP HealthConnect; largest private electronic health record system
STATS
9.6M members
38 medical centres
620 medical ofﬁces
177k emloyees
17k physicians
50k nurses
Turn over 56.4G USD
Net income 3.1G USD

Project with Kaiser Permanente HEDIS Measures for Diabetic Care
HEALTHCARE EFFECTIVENESS DATA AND INFORMATION SET
HEDIS
Performance measure speciﬁcation issued NCQA1
(USA)
Percentages of a precisely deﬁned eligible population:
#Eligible with eye exam
#Eligible(is Diabetic,≤65yo, etc)
Entry requirements for government funded healthcare (Medicare)
1
National Committee for Quality assurance

HEALTHCARE EFFECTIVENESS DATA AND INFORMATION SET
HEDIS
Performance measure specification issued NCQA1
(USA)
Percentages of a precisely defined eligible population:
#Eligible with eye exam
#Eligible(is Diabetic,≤65yo, etc)
Entry requirements for government funded healthcare (Medicare)
HEDIS MEASURE COMPUTATION: TODAY
Disparate data sources (historically grown)
Ad-hoc schemas used to store data (meaning implicit)
Involved programs for analytics software
mix data (re)formatting and measuring
difficult to maintain
require high expertise of IT-experts
1
National Committee for Quality assurance

HEDIS MEASURE COMPUTATION IN OUR PROJECT
NEW APPROACH (PETER HENDLER, ROBERT PIRO)
Separate data aggregation and reformatting from computing measures!
Data model inspired by HL7 RIM: ‘Entities in Roles Participating in Acts’
Data translated as RDF-triples into the data model ﬁrst (Java/Scala)
RDFox Datalog rules compute measures according to this model
Results are read out through simple queries

HEDIS MEASURE COMPUTATION IN OUR PROJECT
NEW APPROACH (PETER HENDLER, ROBERT PIRO)
Separate data aggregation and reformatting from computing measures!
Data model inspired by HL7 RIM: ‘Entities in Roles Participating in Acts’
Data translated as RDF-triples into the data model ﬁrst (Java/Scala)
RDFox Datalog rules compute measures according to this model
Results are read out through simple queries
BENEFITS
Reusability: uniform data model reusable for other tasks
Efﬁciency: rules are close to natural language & concise
Maintainability: rules are declarative and easy to understand

Project with Kaiser Permanente Data Model
DATA MODEL
INSPIRED BY HL7 REFERENCE INFORMATION MODEL (RIM)
Entity Role Participation Act
hasRole hasPart hasAct
ISO standard: ISO/HL7 21731:2014
Process centric (Administrative KR)
Developed for/in the medical community; BUT ‘NHS experience’

DATA MODEL
EXAMPLE
Getting a coffee
Person Customer Purchaser
‘Buying a
product’
Person Barista Preparer
Subst Coffee Product
Person Customer Consumer
hasRole hasPart
hasAct
hasRole hasPart
hasAct
hasRole hasPart
hasAct

DATA MODEL
EXAMPLE
Contract for Work
Person Customer Offering Party
‘Buying a
product’
Person Representative Accepting Party
Subst Coffee Work Result
Person Customer Beneﬁciary
hasRole hasPart
hasAct
hasRole hasPart
hasAct
hasRole hasPart
hasAct

DATA MODEL
EXAMPLE
Prescription
Person Physician Prescriber Prescription
Person Pharmacist Dispenser
Subst Drug Medication
Person Patient Recipient
hasRole hasPart
hasAct
hasRole hasPart
hasAct
hasRole hasPart
hasAct

Project with Kaiser Permanente Data Model as RDF Triples
DATA MODEL AS RDF TRIPLES
DATA MODEL USED FOR HEDIS
Entity(EN00)
Name: ”John Smith”
Gender: kp:male
DoB: ”1973-10-22”ˆˆxsd:date
type: cat:person
Role(RL00)
type : cat:Patient
Act(ACT00)
Date : “2013-03-22”ˆˆxsd:date
type: cat:Diagnosis
Participation(PT00)
type : cat:Subject
kp:hasRole
kp:hasPart
kp:hasContext

DATA MODEL AS RDF TRIPLES
DATA MODEL USED FOR HEDIS
Entity(EN00)
Name: ”John Smith”
Gender: kp:male
DoB: ”1973-10-22”ˆˆxsd:date
type: cat:person
Role(RL00)
type : cat:Patient
Act(ACT00)
Date : “2013-03-22”ˆˆxsd:date
type: cat:Diagnosis
Participation(PT00)
type : cat:Subject
kp:hasRole
kp:hasPart
kp:hasContext
ENCODING IN RDF-TRIPLES
EN00 kp:DoB ”1973-10-22”ˆˆxsd:date PT00 kp:hasContext ACT00 .
EN00 kp:hasRole RL00 . ACT00 rdf:type cat:Diagnosis .
RL00 rdf:type kp:Patient .
RL00 kp:hasPart PT00 .

DATA TRANSLATION
DATA PROVIDED
DATA STATS
About Records Size About Records Size
Providers 113k 6.8M Labs 28.3M 1.4GB
Members 466k 84MB Prescriptions 8.9M 892MB
Enrollments 3.3M 332MB Visits 54M 8.6GB
2

DATA TRANSLATION
DATA PROVIDED
DATA STATS
About Records Size About Records Size
Providers 113k 6.8M Labs 28.3M 1.4GB
Members 466k 84MB Prescriptions 8.9M 892MB
Enrollments 3.3M 332MB Visits 54M 8.6GB
TRANSLATION & IMPORT
Translation time: 45min @ 8threads
902M triples (4.6GB gzipped), 547M unique
RDFox import time 390s @ 8threads
2

Project with Kaiser Permanente The Datalog Rules
DATALOG RULES
RULES HEDIS DIABETES CARE DENOMINATORS AND NUMERATORS
174 rules in 607 lines of code distributed in 21 ﬁles
authored on a 200 patient test set using an interactive autoring tool

DATALOG RULES
MATERIALISATION
8 Intel Xeon E5-2680@2.7GHz with 64GB RAM
Data import + materialisation: 1h40m
Maximal number of triples before subgraph extraction: 731M (43GB)
Subgraph 71.7M triples (4GB), maximal number of triples: 92.2M (4.8GB)

DATALOG RULES
MATERIALISATION
8 Intel Xeon E5-2680@2.7GHz with 64GB RAM
Data import + materialisation: 1h40m
Maximal number of triples before subgraph extraction: 731M (43GB)
Subgraph 71.7M triples (4GB), maximal number of triples: 92.2M (4.8GB)
SUMMARY
Data is translated into RDF triples
RDFox computes with a Datalog Program and the RDF triples the
materialisation
Results are obtained by querying the triple store (SPARQL)

RULE EXAMPLE
EXAMPLE
Patients must be enrolled and can have multiple enrollements in a year.
Enrollments are given as [begin-date,end-date] pair per patient.
“Compute all patients with contintuous enrollments within the
measurement year” i.e. the enrollments must form a connected chain
[x0, x1] . . . [xi , xi+1][xi+1, xi+2] . . . [xn−1, xn]
such that “2013-01-01” and “2013-12-31” are enclosed by some interval
[?Patient, aux : continiousEnrollment, ?PredEnr] ←
[?Patient, aux : continiousEnrollment, ?Enr],
[?Enr, kp : hasBeginConnectDateTime, ?begin],
[?Patient, aux : roleHasEnrollment, ?PredEnr],
[?PredEnr, kp : hasEndDateTime, ?begin] .

Conclusion & Future Work
CONCLUSION & FUTURE WORK
CONCLUSION
Created a use-case / Impact Scenario: real requirements, real data
Rooting of reasearch; usefulness of RDFox, new avenues, benchmarks
FUTURE WORK
Rule authoring tool / anoymisation of the data
Research
stratiﬁcation of the reasoning
negation + aggregates
Big data reasoning + browsing
www.rdfox.org

Semantic Web Technologies in Health Care Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (18)

Similar to Semantic Web Technologies in Health Care Analytics

Similar to Semantic Web Technologies in Health Care Analytics (20)

Recently uploaded

Recently uploaded (20)

Semantic Web Technologies in Health Care Analytics