Talk presented in Spain (WiMS 2013/UAM-Madrid, UMA-Malaga), June 2013.
Replaces earlier version at: http://www.slideshare.net/apsheth/semantic-technology-empowering-real-world-outcomes-in-biomedical-research-and-clinical-practices
Biomedical and translational research as well as clinical practice are increasingly data driven. Activities routinely involve large number of devices, data and people, resulting in the challenges associated with volume, velocity (change), variety (heterogeneity) and veracity (provenance, quality). Equally important is to realize the challenge of serving the needs of broader ecosystems of people and organizations, extending traditional stakeholders like drug makers, clinicians and policy makers, to increasingly technology savvy and information empowered patients. We believe that semantics is becoming centerpiece of informatics solutions that convert data into meaningful, contextually relevant information and insights that lead to optimal decisions for translational research and 360 degree health, fitness and well-being.
In this talk, I will provide a series of snapshots of efforts in which semantic approach and technology is the key enabler. I will emphasize real-world and in-use projects, technologies and systems, involving significant collaborations between my team and biomedical researchers or practicing clinicians. Examples include:
• Active Semantic Electronic Medical Record
• Semantics and Services enabled Problem Solving Environment for T.cruzi (SPSE)
• Data Mining of Cardiology data
• Semantic Search, Browsing and Literature Based Discovery
• PREscription Drug abuse Online Surveillance and Epidemiology (PREDOSE)
• kHealth: development of a knowledge-enhanced sensing and mobile computing applications (using low cost sensors and smartphone), along with ability to convert low level observations into clinically relevant abstractions
Further details are at http://knoesis.org/amit/hcls
Hybridoma Technology ( Production , Purification , and Application )
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research and clinical practices
1. 1
Semantic Web & Web 3.0 empowering real
world outcomes in biomedical research and
clinical practices
Amit Sheth
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
Wright State University, Dayton, Ohio
http://knoesis.org
http://knoesis.org/amit/hcls
Special thanks: Sujan Perera; Ack: Kno.e.sis HCLS team and collaborators
Talk presented in Spain (WiMS 2013/UAM-Madrid, UMA-Malaga), June 2013
6. Role of Semantic Web in HCLS
• Improve the machine understandability and
processing of all types of data by
• Modeling and Background Knowledge
• Annotation
• Complex Querying/Analysis, Reasoning
• Improve Insight from Biomedical Data
• Improve Clinical Decision Making
• Vastness/Volume
• Velocity
• Variety/Heterogeneity
• Vagueness, Uncertainty, Inconsistency, Deceit
Objective
Challenges
Approach
7. Identifiers: URI Character set: UNICODE
Syntax: XML
Data interchange: RDF
Querying:
SPARQL
Taxonomies: RDFS
Ontologies:
OWL
Rules:
RIF/SWRL
Unifying logic
Proof
Trust
Cryptography
User interface and applications
Querying
Data/Knowledge
Representation
Knowledge
Representation
Lots of need for NLP, ML, IR, and other technologies –
SW significantly empowers these and closes some critical gaps
8. HCLS Apps @ Kno.e.sis
• Semantic Search and
Browsing(Doozer++,
SCOONER, iExplore)
• Semantics and Services
enabled Problem Solving
Environment for
T.cruzi(SPSE)
• Active Semantic Electronic
Medical Record(ASEMR)
• Mining and Analysis of
EMR(ezFIND, ezMeasure,
ezCAC)
• kHealth (ADHF, Asthma, …)
• PREscription Drug abuse
Online Surveillance and
Epidemiology(PREDOSE)
Biomedical
Healthcare
Epidemiology
10. Knowledge Acquisition – Doozer++
• Building ontology is costly
• Large volume of knowledge available in semi-
structured/unstructured format
• No assurance for the credibility of such
knowledge
15. Doozer++ Demo
Knowledge Acquisition from Community-Generated Content
Continuous Semantics to Analyze Real-Time Data , IEEE Internet Computing (Volume 14)
16. • Identify Relationships
• Textual pattern-based extraction for known
relationships
• Facts available in background knowledge
• Find evidence for such facts
• Combined evidence from many different
patterns increases the certainty of a
relationship between the entities
Beyond Hierarchy
17. • Evaluating acquired knowledge
• Explicit
• User can vote for facts
• Facts presented based on user interests
• Implicit
• User’s browsing history used as a indication of
which propositions are correct and interesting
• Now it adds validated knowledge back to community
Validating Knowledge
18. Base Hierarchy from
Wikipedia
SenseLab Neuroscience
Ontologies
Meta Knowledgebase
PubMed Abstracts
Focused pattern
based extraction
Initial KB creation
Enriched
Knowledgebase
HPC
Keywords
Kno.e.sis: NLP
based triples
NLM: Rule based
BKR triples
Building Human Performance &
Cognition Ontology (HPCO)
Merge
http://wiki.knoesis.org/index.php/Human_Performance_and_Cognition_Ontology
19. Use Case for HPCO
• Number of Entities – 2 million
• Number of non-trivial facts – 3 million
• NLP Based*: calcium-binding protein S100B
modulates long-term synaptic plasticity
• Pattern Based**: Olfactory Bulb has physical
part of anatomic structure Mitral cell
* Joint Extraction of Compound Entities and Relationships from Biomedical Literature , Web Intel. 2008
* A Framework for Schema-Driven Relationship Discovery from Unstructured Text, ISWC 2006
** On Demand Creation of Focused Domain Models using Top-down and Bottom-up Information Extraction, Technical
Report
21. SCOONER Demo
SCOONER Details
An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused
Bioscience Domains , IHI 2012- 2nd ACM SIGHIT International Health Informatics Symposium
22. Kino
• An integrated suite of tools that enables
scientists to annotate
– unstructured resources
– semi-structured resources
• Annotates documents by accessing NCBO
ontologies, via the NCBO Web API.
• Includes two main components
– A browser-based annotation front-end
– An annotation-aware back-end index that provides
faceted search capabilities
33. Active Semantic Electronic Medical
Record - ASEMR
• New Drugs
• Adds interaction with current drugs
• Changes possible procedures to treat an
illness
• Insurance coverage changes
• Will pay for drug X, but not Y
• May need certain diagnosis before
expensive tests
• Physicians are require to keep track of ever
changing landscape
34. • A Document
• With semantic annotations
• entities linked to ontology
• terms linked to specialized lexicon
• With actionable information
• rules over semantic annotations
• rule violation indicated with alerts
Atrial fibrillation with prior stroke, currently
on Pradaxa, doing well.
Mild glucose intolerance and hyperlipidemia,
being treated by primary care.
ASEMR – Active Semantic Document
35. • Type of ASD
• Three Ontologies
• Practice
Information about practice such as
patient/physician data
• Drug
Information about drugs, interaction,
formularies, etc.
• ICD/CPT
Describes the relationships between CPT
and ICD codes
ASEMR – Active Semantic Patient Record
43. Semantics and Services enabled
Problem Solving Environment for
T.cruzi - SPSE
• Majority of experimental data reside in labs
• Integration of lab data facilitate new insights
• Formulating queries against such data required
deep technical knowledge
A Semantic Problem Solving Environment for Integrative Parasite Research:
Identification of Intervention Targets for Trypanosoma cruzi, 2012
45. • Integrated internal data with external databases, such as
KEGG, GO, and some datasets on TriTrypDB
• Developed semantic provenance framework and influenced
W3C community
• SPSE supports complex biological queries that help find
gene knockout, drug and/or vaccination targets. For
example:
• Show me proteins that are downregulated in the epimastigote
stage and exist in a single metabolic pathway.
• Give me the gene knockout summaries, both for plasmid
construction and strain creation, for all gene knockout targets that
are 2-fold upregulated in amastigotes at the transcript level and
that have orthologs in Leishmania but not in Trypanosoma brucei.
SPSE
46. Complex queries can also include:
- on-the-fly Web services execution to retrieve additional data
- inference rules to make implicit knowledge explicit
SPSE
47. • So many ontologies
• Rich in number of concepts
• Mostly concentrated on taxonomical
relationships
• Applications require domain relationships
• A is_symptom_of B
• C is_treated_with D
Knowledge Enrichment from Data
50. Knowledge Enrichment from Data
atrial Fibrillation
hypertension
diabetes
chest pain
weight gain
discomfort in chest
rash skin
cough
weight loss
headache
edema
shortness of breath
fatigue
syncope
weight loss
chest pain
discomfort in chest
dizzy
shortness of breath
nausea
vomiting
headache
cough
weight gain
Diseases
Symptoms
Symptoms
From EMR From KB
Is edema symptom of atrial fibrillation?
Is edema symptom of hypertension?
Is edema symptom of diabetes?
51. Domains
Cardiology
Orthopedics
Oncology
Neurology
Etc…
No of concepts 1008161
Problems(diseases, symptoms) 125778
Procedures 262360
Medicines 298993
Medical Devices 33124
Relationships 77261
is treated with (disease -> medication) 41182
is relevant procedure (procedure -> disease) 3352
is symptom of (symptom -> disease) 8299
contraindicated drug (medication -> disease) 24428
Knowledge Enrichment from Data
with the above
method
+
UMLS
healthline.com
druglib.com
52. • 80% unstructured healthcare data
• Pose challenges in
• Searching
• Understanding
• Mining
• Knowledge discovery
• Decision support
• Evidence based medicine
• Federal policies promote meaningful use and
pose constraints to healthcare system
Healthcare Challenge
53. Coding Complexity ICD-9 ICD-10
Diagnostic Codes 14,000 69,000
Procedure Codes 3,800 72,000
ICD-9
(Current)
ICD-10 Conversion
(1st Oct,2014)
Clinical
Documentation &
Coding-Billing
Challenges
Example: 821.01: ICD-9 code for “closed” Fractured Femur, or thigh bone.
Translates to 36 codes in ICD-10 with details regarding the precise nature of
fracture, which thigh was fractured, whether a delay in healing occurred etc.
Healthcare Challenge
54. • Traditional methods doesn’t work
• Understanding the context is crucial
Need to Do Better
Healthcare Challenge
57. ezHealth - Benefits
• Advance search
• All hypertension patients with ejection
fraction <40
• All MI patients who are taking either beta-
blockers or ACE Inhibitors
• Patients diagnose with Atrial Fibrillation on
Coumadin or Lovanox
• Support core-measure initiative
58. Error Detection
EMR:
1. “Sepsis due to urinary tract infection….”
2. “Her prognosis is poor both short term and long term, however, we
will do everything possible to keep her alive and battle this infection."
SNM:40733004_infection SNM:68566005_infection_urinary_tract
A syntax based NLP extractor
(such as Medlee) can extract
this term and annotate as
SNM:40733004_infection
By utilizing IntellegO and cardiology
background knowledge, we can more
accurately annotate the term as
SNM:68566005_infection_urinary_tract
*MedLEE
with usage of IntellegO
Problem Problem
*MedLEE is NLP engine optimized to parse clinical documents
59. EMR: ”The patient is to receive 2 fluid boluses."
SNM:32457005_body_fluid
A syntax based NLP extractor
(such as Medlee) can extract
this term and annotate as
SNM:32457005_body_fluid
MedLEE
Problem
Fluid is part of buloses treatment, not a problem
with IntellegO
By utilizing IntellegO and cardiology
background knowledge, we can determine
that this is not a symptom – hence
annotation is incorrect.
Treatment
Error Detection
60. The balance of evidence would suggest
that his episode of atrial fibrillation seems
to be an isolated event
He has had no documented atrial
fibrillation since that time
Patient has atrial fibrillation
Patient does not have atrial
fibrillation
NLP
NLP
Atrial FibrillationSyncope
Is_symptom_of
Warfarin
Atenolol
AspirinIs_medication_for
Resolve Inconsistency
Using domain relationships we
validated that patient has atrial
fibrillation
Symptoms Medication
Medication
Medication
61. She denies any chest pain but is not really
function due to leg stiffness, swelling an
shortness of breath
Regarding the shortness of breath, we will
send for a dobutamine stress
echocardiogram
Patient does not have
shortness of breath
Patient has shortness of breath
NLP
NLP
Shortness of Breath
Is_symptom_of
Obesity
Hypertension
Sleep Apnea
Obstructive
Resolve Inconsistency
Using domain relationships we
validated that patient has
shortness of breath
Disorder
Disorder
Disorder
62. PREscription Drug abuse Online
Surveillance and Epidemiology -
PREDOSE
• Non-medical use of Prescription Drugs
• Fastest Growing Drug problem in US
• Director ONDCP Gil Kerlikowske, Epidemic*
• Pathway to heroin addiction
• Escalating accidental overdose deaths
• Current Epidemiological Data Systems
• Interactive Interviews
• Online Surveys
• Manual Coding
63. Specific Aims
Describe drug user’s knowledge, attitudes,
and behaviors related to illicit use of
Prescription Drugs (Content Analysis)
Describe temporal patterns of non-medical
use of Prescription Drugs
(Trend Analysis)
64. Overall Approach
1. Automate Data Collection
• Social Media - Online Web forums
2. Create Structured Domain Vocabulary
• Drug Abuse Ontology (DAO)
3. Automate Information Extraction
• Entity, Relationship, Triple, Sentiment, Template
4. Develop Tools for Data Analysis
a) Content Analysis - Content Explorer, Template Pattern
Explorer, Proximity Search
b) Trend Analysis - Trend explorer, Emerging pattern explorer
65. Web
Crawler
Informal Text DatabaseWeb Forums
2
4
5
8
Data Cleaning
Stage 1. Data Collection
3
Stage 2. Automatic Coding
Stage 3. Data Analysis and Interpretation
1
6
Qualitative and Quantitative Analysis
of Drug User Knowledge, Attitudes
and Behaviors
+ =
Semantic Web Database
Information Extraction Module
Temporal Analysis for Trend Detection
10
Triples/RDF Database
Entity
Identification
Sentiment
Extraction
Relationship
Extraction
Triple Extraction
7
Opioid, Cannabinoid,
Side Effect, Feeling
[Buprenorphine has_slang_term bupe]
[Suboxone subClassOf Buprenorphine]
[Suboxone_Injection CAUSES Nausea]
Drug Abuse Ontology
(Schema)
9
PREDOSE Web Application
9
66. Research Highlights
Drug Abuse Ontology
• First ontology on prescription drug abuse
Ontology-based Entity Identification
• Gold standard dataset 601 posts
• Buprenorphine – 33:1 Slang-to-drug mentions
• Loperamide – 24:1 Slang-to-drug mentions:
• 85% Precision, 72% Recall
67. Research Highlights
Template-based (Knowledge-Aware) Search
• Complex Information Needs
Solution
1. Ontology-based Search
2. Rule-based Search – Intensity, Frequency, Dosage, Interval
3. Context-Free Grammar – Queries Interpretable by PREDOSE
4. Data Sources – Ontology, Lexicon, Lexico-ontology, Alphabet
69. kHealth
71
Health information is now available from multiple sources
• medical records
• background knowledge
• social networks
• personal observations
• sensors
• etc.
70. 72
Foursquare is an online application which
integrates a persons physical location and
social network.
Community of enthusiasts that share experiences of
self-tracking and measurement.
FitBit Community allows the
automated collection and
sharing of health-related data,
goals, and achievements
kHealth
71. 73
Sensors, actuators, and mobile computing are playing an
increasingly important role in providing data for early phases of
the health-care life-cycle
This represents a fundamental shift:
• people are now empowered to monitor and manage their own health;
• and doctors are given access to more data about their patients
kHealth
74. 76
Personal Health Dashboard
1 2 3
Continuous Monitoring Personal Assessment Medical Service
Auxiliary Information – background knowledge, social/community support,
personal context, personal medical history
kHealth
81. 83
kHealth - Technology
Explanation: is the act of choosing the objects or events that best
account for a set of observations; often referred to as hypothesis
building
Discrimination: is the act of finding those properties that, if
observed, would help distinguish between multiple explanatory
features
82. 84
kHealth - Technology
Explanatory Feature: a feature that explains the set of
observed properties
ExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Observed Property Explanatory Feature
Explanation
83. 85
kHealth - Technology
Discrimination
Expected Property: would be explained by every explanatory
feature
ExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ∃ssn:isPropertyOf.{fn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Expected Property Explanatory Feature
84. 86
kHealth - Technology
Discrimination
Not Applicable Property: would not be explained by any
explanatory feature
NotApplicableProperty ≡ ¬∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ¬∃ssn:isPropertyOf.{fn}
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
Not Applicable Property Explanatory Feature
88. 90
kHealth Demo
An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web, Applied Ontology 2011
Representation of Parsimonious Covering Theory in OWL-DL (OWLED 2011)
An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices (ISWC 2012)
Data Driven Knowledge Acquisition Method for Domain Knowledge Enrichment in Healthcare, BIBM 2012
90. 92
kHealth - Asthma
• Can we detect asthma/allergy early?
– Using data from on-body sensors, and environmental sensors
– Using knowledge from an asthma ontology, generated from asthma knowledge on
the Web and domain experts
– Generate a risk measure from collected data and background knowledge
• Can we characterize asthma/allergy progression?
– State of asthma patient may change over time
– Identifying risky progressions before worsening of the patient state
• Does the early detection of asthma/allergy, and
subsequent intervention/treatment, lead to
improved outcomes?
– Improved outcomes could be improved health (less serious symptoms), less need
for invasive treatments, preventive measures (e.g. avoiding risky environmental
conditions), less cost, etc.
91. • GO (well controlled)
– peak flow 80-100%*
– Good breathing and sleep: Acceleration reading pattern
– No cough: microphone
– Good physical activity: Acceleration reading pattern
• CAUTION (not well controlled)
– peak flow 60-80%*
– Cough and Wheeze: microphone
– Tight chest: Acceleration readings
– Wakes up at night: Acceleration reading pattern
• STOP (poor control)
– peak flow < 60%*
– Medicine not helping: medicine = TRUE and still in STOP state
– Breathing hard and fast: microphone
– Can’t walk or talk well: Acceleration and microphone
93* Measured using peak flow meter
Asthma Control Level and
Corresponding Sensor Observations
93. 95
Personal
Level Events
Population Level
Events
(Personal Level Events)
(Personalized Events) (Population Level Events)
Population-level
Events Relevant at
the Personal-level
Machine sensors:
Pollen levels
Pollution levels
Accelerometer
Peak flow meter
Medication tracking
Personal sensors:
Symptoms
(kHealth) (EventShop)
Qualify & Quantify
-Detect all the factors
influencing asthma
-Find the role of each
factor in influencing
asthma
Asthma Risk Profile
-Contextual information
to personalize risk
-Risk score computation
Asthma Mitigation
-Corrective action based
on risk score
What are the factors influencing my asthma?
What is the contribution of each of these factors?
How controlled is my asthma? (risk score)
What will be my action plan to manage asthma?
Storage
Pose Questions
Receive answers
Access/update patient
information
Machine sensors:
Pollen levels
Pollution levels
Personal sensors:
Symptoms
Asthma prevalence
94. 96
Community Spaces
Personal Spaces
Personal
Wheeze – Yes
Do you have tightness of chest? –Yes
ObservationsPhysical-Cyber-Social System Health Signal Extraction Health Signal Understanding
<Wheezing=Yes, time, location>
<ChectTightness=Yes, time, location>
<PollenLevel=Medium, time, location>
<Pollution=Yes, time, location>
<Activity=High, time, location>
Wheezing
ChectTightness
PollenLevel
Pollution
Activity
Wheezing
ChectTightness
PollenLevel
Pollution
Activity
RiskCategory
<PollenLevel, ChectTightness, Pollution,
Activity, Wheezing, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
.
.
.
Actionable Information
Action: contact doctor now
Explanation: Increased activity is the primary cause of wheezing and high risk category
Expert Knowledge
Background Knowledge
tweet reporting pollution level
and asthma attacks
Acceleration readings from
on-phone sensors
Sensor and personal observations Signals from personal, personal
spaces, and community spaces
Risk Category assigned by doctors
Qualify
Quantify
Enrich
Outdoor pollen and pollution
96. Thank You
Visit Us @
www.knoesis.org
with additional background at http://knoesis.org/amit/hcls
97. Ohio Center of Excellence in Knowledge-enabled Computing -
An Ohio Center of Excellence in BioHealth Innovation
Wright State University
98. Amit Sheth’s
PHD students
Ashutosh Jadhav
Hemant
Purohit
Vinh
Nguyen
Lu Chen
Pavan
Kapanipathi
Pramod
Anantharam
Sujan
Perera
Alan Smith
Pramod Koneru
Maryam Panahiazar
Sarasi Lalithsena
Cory Henson
Kalpa
Gunaratna
Delroy
Cameron
Sanjaya
Wijeratne
Wenbo
Wang
Kno.e.sis in 2012 = ~100 researchers (15 faculty, ~50 PhD students)
Editor's Notes
Heterogeneity of data to be integrated(Variety)
QualityHow do you fix it? Measure it?How do you decide
Consumers are changedClinicians + drug makers + Insurance companiesTechnology savvy users + gadgetsPut the text from 360
We have lot of data, we are trying to use meaningfully, but still customer(users) are not satisfiedSo we need computer to understand the data
What is semantic web?http://en.wikipedia.org/wiki/Semantic_WebVast – huge dataVague – define ‘young’ ‘tall’Uncertainty - a patient might present a set of symptoms which correspond to a number of different distinct diagnoses each with a different probabilityDeceit - intentionally misleading
The technology stack and usage of most popular technologies
Kno.e.sis products
This slide intend justify the development of tools doozer, scooner, Kino, iExplorerHuge amount of knowledge in different format and people are overloaded withKnowledge/Information, we need mechanism to better exploration of knowledgeAnd help them to find what they require(scooner, iExplorer) and derive new knowledge
Why doozer?Knowledge is available in various formats, but they are hardly helpful if not inStructured format. But building structured knowledgebase from available formats is achallenge
Human knowledge cycleDoozer is a one tool that supports this
Forms of open knowledgeWikipediaLODFormal models
Knowledge acquisition through Model creation
Hierarchy creation from wikipedia
Big picture
Doozer’s way of identifying relationships
Last two steps of knowledge cycle
Big pictureKno.e.sis: NLP based triples - CarticRamakrishnan's and Pablo's work on open Information Extraction from biomedical text.Sentences in MedLine abstracts are parsed and split into Subject, Predicate and Object.In the Merge phase, only those triples that have Subject and Object that can be mapped to the initial KB are added to the enriched KB.BKR triples is that the BKR triples were probably verified by NLM before being published, whereas the Knoesis triples went into the KB unverified, apart from having to match initial KB concepts.
Last two steps of knowledge cycle
Why scooner
demo
Semantic annotation maps target data resources to concepts in ontologies.Extra information is added to the resource to connect it to its corresponding concept(s) in the ontology.This system includes two main components, a browser-based annotation front-end, integrated with NCBO and an annotation-aware back-end index that provides faceted search capabilities
illustrates the user interface of the annotator plug-in. When the user highlights and right clicks in a word or a phrase, the browser’s context menu includes the annotatation as a phylogenetical concept menu item. Selecting this menu item brings up the annotations window where the highlighted term is searched using the NCBO RESTful API and a detailed view of the available ontological terms is shown to the user to select. The user can search or browse for a concept in any ontology hosted in NCBO. Once all the annotations are added, users can directly submit the annotations to a predefined (configurable through an options dialog) Kino instance, by selecting the publish annotations menu item [3]. Kino supports generic domain annotations, and is capable of providing facets on any domain. Kino is built on top of Apache SOLR6, a facet capable indexing and searching engine that is easily extensible. The current Kino framework supports three facets based on the SA-REST specification. The index manages content of each annotation, the annotated text and the content of the document, hence the users have to flexibility to search on the annotated concept as well as the document content similar to a text based search engine.
Novelty of this annotation process: annotate the term in XML with the triples from ontology not just the concepts from ontology.3 kind of Object values in annotation:Literal object valueRemote resource as an object valueA nested annotation as an object valueThis is the first effort Step 1: tree1(s) is- close-match (p) tree (o)Step 2: tree (s) is-inferred –by (p) maximum-liklihood (o)
User can search for specific term and get all the annotate documents with that specific term.
Knowledge and data are separatedThere is no way to validate whether my data adheres to knowledge and vice-versa
Architecture
Generate Novel hypothesis
The challengeWhy ASEMR?
How ASEMR?
How ASEMR?
The architecture
Why SPSE?Integration of data gives more insights, but the heterogeneity of data stand against the integration
How SPSE
Benefits
why
EMR documents not only contain data/information but knowledge tooBut scattered nature of knowledge makes it difficult to discover
The big pictureThe built knowledgebase should be able to explain the real world data,We used this claim in reverse order: real world data can be used to enhance the Knowledge base when it fails to explain the dataScenario: Extract all diseases from the documentGenerate all possible symptoms for these diseases using knowledgebaseExtract all symptoms from the documentIf there are more symptoms in document than the generated set, this indicates that we might be missingsome relationship betweenDisease and symptomsWe use this indication to generate questions that can be answered by the domain expert, this will allow us to enrich the knowledgebase
From EMR: we extract the diseases and symptoms (we have already annotated concept in the EMR with our background knowledge)We generate the symptom coverage for the diseases found(union of symptoms that each disease attached to in the knowledge base)Now we have observed symptoms and all possible symptomsAssumption : observed symptoms should be a subset of all possible symptomsWhenever we found that there is a symptom in observed list which is not in all possible list, we can generate the hypothesis and verify with the domain expert.What we found is edema is symptom of hypertension.This method will reduce the workload of domain expertImagine we have 50 diseases and 100 symptomsThen there are 5000 possibilities,Domain expert has to go through each and validate, but with this methodWe will only ask the question only if we find evidence
What we achieved?Not sure whether this slide is requiredWe used lot of existing knowledgebases to build this knowledgebaseWe extract the knowledge from the listed websites by crawling and the annotating the concepts using UMLS
Unstructured data posing challenges in every field, but here is our attempt to overcomeThe challenge in healthcareTraditional methods - IR, Data mining, traditional NLP
People waiting to harness the unstructured healthcare data for all these applications
Emphasize the capability of inferencing (only because we have knowledgebase) andPoint out that how difficult to formulate such queries if knowledgebase is not available
EMR doc has these two sentences‘Urinary tract infection’ (first sentence) is correctly annotated, but ‘infection’ in second one is not.Second ‘infection’ actually refers to ‘urinary tract infection’ in first sentence, but NLP engineDoes not understand this.We could find this because there are no evidences to suggest ‘infection’ in the document according to our knowledgebase.So after detecting this issue, we could annotate the second infection as urinary tract infection(this annotation is done manually) Detection is done with IntellegOOne could rather argue that annotating second ‘infection’ as just infection does not harm because urinary tract infection is alsoInfection, but detection of these things help to improve the annotation.
NLP engine annotate the fluid as ‘body_fluid’ which is a symptomBut here the term ‘fluid’ does not refer to symptom rather the form of medication ‘boluses’We could find this issue because there was no disease in the document to suggest the ‘body fluid’
In this case NLP does not detect second statement is talking about history.But with the knowledgebase we have, we can say patient actually has AF.So we resolve the inconsistency here.Example from document 673
NLP does not understand the first sentenceIt attaches ‘not’ to shortness of breath which is wrong according to semantics of the sentence.But we can resolve this issue by using knowledgebaseExample from document 595
Specific Aim 1: Has a special focus on recently approved abuse-deterrent formulation; It is now more difficult to obtain prescribed Oxycontin;
1. Non-medical use of prescription drugs is fastest growing form of drug abuse in USEpidemic: Responding to America’s Prescription Drug Abuse CrisisGATEWAY DRUGNational Survey on Drug Use and Health (NSDUH) - nearly one-third of people aged 12 and over who used drugs for the first time in 2009 began by using a prescription drug non-medically2. Purdue Pharma - Best known for its pain-treatment products, OxyContin - Oxycontin reformulation (Aug 2010)Pathway to heroin addiction [1] (2003)[1] Probable Relationship Between Opioid Abuse and Heroin Use - ROBERT G. CARLSON, PH.D.The Ohio Substance Abuse Monitoring (OSAM) Network (In Dayton, 10 subjects, aged 18 to 33 years, were interviewed.)Accidental drug overdose death [2] (2008)[2] Recent changes in drug poisoning mortality in the United States by urban-rural status and by drug type. Paulozzi LJ, Xi Y. @ CDC 2008 - 1999-2004, degree of urbanization, found opiod abuse
Buprenorphine is an opioid antagonist used in the management of opioid addiction, including such opioids as heroin, oxycontin and vicodin, Prescribed daily dosage typically range from 4–24mg
Slang, abbreviations, equivalent concepts
Posts which mentioned Buprenorphine, benzodiazepine
Multi model healthcare data
Recent advancement in observation mechanisms and data sharing
Sensors play key role
But still we are here
We need to get here
Kno.e.siskHealth ideaOngoing work : simulating first two phasesOur product is MobileMDDemo is at the end of the slides
The ChallengeWe have sensors to measure movements, heart rate, sleeping, galvanic skin response etc…But we don’t know how to aggregate
Key ingredients which will help to understand the healthcare data(measurements)
Numbers->abstractions->knowledge integration(static knowledge about the domain, personal background)->predictionAdvantages: early detection and alert generation
Observe data from different sensors at the same time.
System Architecture Fig. shows an overview of the SemHealth architecture. SensorsAll are bluetooth sensors already utilized by the current k-Health application to measure weight, heart rate, and blood pressureAndroid applicationReads sensor observations through bluetoothPerforms annotation on observations and generates percepts from those observationsUploads annotated observations and percepts to the server-side data storeRetrieves data using DSU API and feeds data to DPU and/or DVU APIsVisualizes data through DVU APIConsidered a “nice to have” as existing visualization may be used as-isWill utilize existing graphing library for Android with Open mHealth-style API that may be translated to browser at a later timeServer-sideOpen mHealth compliant DSU and DPU APIsTriple data storage replaces existing SQLite database in k-Health applicationExisting k-Health reasoner now the brains behind DPU
Example of queries: Give a time during the last 5 days when Blood pressure and heart rate were high for selected patient.Give me a last time any patient exhibited pre-hypertension.Give me a patient who exhibited reading of pre-hypertention and normal heart rate along with most recent timestamps for both reading.