Health care data is growing at an explosive rate, with highly detailed physiological processes being recorded, high resolution scanning techniques (e.g. MRI), wireless health monitoring systems, and also traditional patient information moving towards Electronic Medical Records (EMR) systems. The challenges in leveraging this huge data resources and transforming to knowledge for improving patient care, includes the size of datasets, multi-modality, and traditional forms of heterogeneity (syntactic, structural, and semantic). In addition, the US NIH is emphasizing more multi-center clinical studies that increases complexity of data access, sharing, and integration. In this talk, I explore the potential solutions for these challenges that can use semantics of clinical data - both implicit and explicit, together with the Semantic Web technologies. I specifically discuss the ontology-driven Physio-MIMI platform for clinical data management in multi-center research studies.
Further Details: http://cci.case.edu/cci/index.php/Satya_Sahoo
Presentation at: Dagsthul Seminar: Semantic Data Management 2012
Author: Satya S. Sahoo
Call Girls Electronic City Just Call 7001305949 Top Class Call Girl Service A...
Awakening Clinical Data: Semantics for Scalable Medical Research Informatics
1. Awakening Clinical Data: Semantics for
Scalable Medical Research Informatics
Satya S. Sahoo
Division Medical Informatics
Electrical Engineering and Computer Science Department
Case Western Reserve University
Cleveland, OH, USA
2. Big Picture of Data in Clinical Research
143, 961 Patients per year
(e.g. Emory) MRI: 50-100MB
PET: 60-100MB
National Sleep Research Resource: 500 TB MRI, PET scans
Patient Reports
source: PRISM project, BME dept CWRU
source: PRISM project CWRU
Case Western EMU: 250 TB
Epilepsy Monitoring Unit (EMU) Data
500-600MB per patient
per stay in EMU
Wireless Health Data
source: CWRU School of Engineering
~5.6 billion wireless
1-20GB each connections and growing
Polysomnograms
Pathology Reports, Tissue Bank
source: Physio-MIMI, PRISM CWRU source: NLM and Wikipedia
3. Big Picture of Data in Clinical Research
143, 961 Patients per year
(e.g. Emory) MRI: 50-100MB
• Ultra large volume of data and growing rapidly
PET: 60-100MB
• Data is Multi-modal, Heterogeneous
• Heterogeneity: Syntactic, Structural, Semantic
National Sleep Research Resource: 500 TB MRI, PET scans
Patient Reports
source: PRISM project, BME dept CWRU
source: PRISM project CWRU
Case Western EMU: 250 TB
Epilepsy Monitoring Unit (EMU) Data
500-600MB per patient
per stay in EMU
Wireless Health Data
source: CWRU School of Engineering
~5.6 billion wireless
1-20GB each connections and growing
Polysomnograms
Pathology Reports, Tissue Bank
source: Physio-MIMI, PRISM CWRU source: NLM and Wikipedia
4. Scalability in Medical Informatics: Beyond Volume
Exemplar: Sleep Medicine Research
MRI, PET scans
Patient Reports
source: PRISM project, BME dept CWRU
source: PRISM project CWRU
Epilepsy Monitoring Unit (EMU) Data
Wireless Health Data
source: CWRU School of Engineering
Polysomnograms
Pathology Reports, Tissue Bank
source: Physio-MIMI, PRISM CWRU source: NLM and Wikipedia
5. Scalability in Medical Informatics: Beyond Volume
Exemplar: Sleep Medicine Research
• Multi-Center Studies with differing
administrative requirements – business logicscans
Patient Reports
MRI, PET
source: PRISM project, BME dept CWRU
source: PRISM project CWRU
• Dynamic data – grows over project duration
Epilepsy Monitoring Unit (EMU) Data
• Data Semantics as foundation to support a
wide spectrum of users – clinicians, nurse
practitioners, research fellows
Wireless Health Data
source: CWRU School of Engineering
Polysomnograms
Pathology Reports, Tissue Bank
source: Physio-MIMI, PRISM CWRU source: NLM and Wikipedia
6. A Wish List for Scalable Clinical Data Management
• Reconcile Data Heterogeneity – most critical to successful
translational research
o Syntactic heterogeneity – less of a problem, data dictionaries
help
o Structural heterogeneity – problematic, XML somewhat helpful
o Semantic heterogeneity – a huge problem, ontologies to the
rescue?
• Provenance – essential for data quality, compliance, insight
o Blood Oxygen Baseline: oxygen saturation during the first 15 or
30 seconds of sleep
o Patient blood report last month cause of change in medication
– Domain Provenance (not just tuple provenance)
• Intuitive access to information – clinical trials eligibility,
cohort identification
• Scalable - Data sources, research partners added or removed
dynamically
7. A “not to do” list for Clinical Data Management
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch
• No Linked Open Patient Data – HIPAA, HITECH
Act (US), Data Protection Act (UK)
o De-identified data – IRB approval
• Ontology as global schema – but no RDF
o Vast majority as RDB
o Practical issues with RDF – cannot be institution-
specific URI (privacy)
8. Physio-MIMI: Multi‐Modality, Multi‐Resource Environment for Physiological
and Clinical Research
Clinical
Researcher
SNOMED-CT FMA
Sleep Domain
… Ontology OGMS
Any
number of
new
centers
9. Physio-MIMI: Enabling Scalable Medical Research
• NCRR‐funded, multi‐CTSA site project: Sleep medicine as
exemplar
• Federated data management – scalable, adapts to changing
data access policies
• Ontology-driven:
o Data mappings – Ontology class to data dictionary terms
(manually curated)
o Drive query interface
o Manage provenance
• Privacy aware, IRB-compliant
• Collaboration among Case Western, U. of Michigan,
Marshfield Clinic and U. of Wisconsin, Madison
o Now Harvard Medical School
11. Data Mappings: SDO to Data Dictionary
Physio-Map Module
• Visual interface
• Stores mappings in XML –
moving towards rules
• Dynamically executed in response
to user query
User Voting
16. Intuitive Query Interface: Ontology (SDO)-driven
Visual Aggregator and Explorer (VisAgE)
DataSets
Ontology Concept – Type of Query Widget
17. PhysioMIMI in National Sleep Research Resource
• National Sleep Research Resource (NSSR) – scored and
awaiting funding review
• Collaboration between Harvard Medical School (domain
experts) and Case Western (CS) with 15 projects
o 50,000 sleep research studies – total size of 500TB
• Semantic Data Integration – SDO and Sleep Provenance
Ontology (extending W3C PROV Ontology PROV-O)
• Signal processing tools – using a common format called
European Data Format (EDF), XML-based
• Domain analysis, cross-linking – secure Web access
18. Challenges: Semantics in Large Scale Clinical Data
• Incentives for adopting RDF in clinical data management
– what is already not possible in RDB?
• OWL2, RDFS reasoning – Privacy aware reasoning,
semantics-aware access control (Nguyen et al. 2012)
• Missing Semantics?
o Variable, missing provenance in original study - re-
create provenance with (limited) provenance?
o Fine-level granularity for semantic annotation of
signal data – currently not scalable
• A little semantics does not go too far in clinical data
o Need for greater involvement of Semantic Web
community in development of EHR systems