1Semantic Web & Web 3.0 empowering realworld outcomes in biomedical research andclinical practicesAmit ShethKno.e.sis – Oh...
Integration
Semantics
Role of Semantic Web in HCLS• Improve the machine understandability andprocessing of all types of data by• Modeling and Ba...
Identifiers: URI Character set: UNICODESyntax: XMLData interchange: RDFQuerying:SPARQLTaxonomies: RDFSOntologies:OWLRules:...
HCLS Apps @ Kno.e.sis• Semantic Search andBrowsing(Doozer++,SCOONER, iExplore)• Semantics and Servicesenabled Problem Solv...
InsightsBetterUnderstandingIntuitiveBrowsingHypothesisGenerationPersonalizationKnowledgeExplorationDoozer++iExploreSCOONER...
Knowledge Acquisition – Doozer++• Building ontology is costly• Large volume of knowledge available in semi-structured/unst...
Knowledge Acquisition – Doozer++Circle of Knowledgehttp://knoesis.org/node/71
Knowledge Acquisition – Doozer++
Knowledge Acquisition – Doozer++
j.1:category_sciencej.1:category_neurosciencej.1:category_cognitive_sciencej.1:category_psychologyj.1:category_behaviorj.1...
Doozer++ DemoKnowledge Acquisition from Community-Generated ContentContinuous Semantics to Analyze Real-Time Data , IEEE I...
• Identify Relationships• Textual pattern-based extraction for knownrelationships• Facts available in background knowledge...
• Evaluating acquired knowledge• Explicit• User can vote for facts• Facts presented based on user interests• Implicit• Use...
Base Hierarchy fromWikipediaSenseLab NeuroscienceOntologiesMeta KnowledgebasePubMed AbstractsFocused patternbased extracti...
Use Case for HPCO• Number of Entities – 2 million• Number of non-trivial facts – 3 million• NLP Based*: calcium-binding pr...
Knowledge-based Browsing - SCOONER• Knowledge-based browsing: relations window,inverse relations, creating trails• Persist...
SCOONER DemoSCOONER DetailsAn Up-to-date Knowledge-Based Literature Search and Exploration Framework for FocusedBioscience...
Kino• An integrated suite of tools that enablesscientists to annotate– unstructured resources– semi-structured resources• ...
Kino Architecture
Example: Annotating Literature
Annotation the XML file withNCBO OntologyrelrelOntologyconcept
Kino Search•Search the annotated document with the concept of interest•Return all annotated document with selected concept
Kino DemoKino: A Generic Document Management System for Biologists Using SA-REST and Faceted Search. ICSC 2011
iExploreInteractive Browsing and ExploringBiomedical Knowledge
Architecture
Generate Novel Hypothesis
iExplore videoiExplore Demo
Turning toApplications with End Users
Active Semantic Electronic MedicalRecord - ASEMR• New Drugs• Adds interaction with current drugs• Changes possible procedu...
• A Document• With semantic annotations• entities linked to ontology• terms linked to specialized lexicon• With actionable...
• Type of ASD• Three Ontologies• PracticeInformation about practice such aspatient/physician data• DrugInformation about d...
encounterancillaryeventinsurance_carrierinsurancefacilityinsurance_planpatientpersonpractitionerinsurance_policyowl:thinga...
owl:thingprescription_drug_brand_namebrandname_undeclaredbrandname_compositeprescription_drugmonograph_ix_classcpnum_group...
ASEMR
0100200300400500600Jan04Mar04May04Jul04Sept04Nov04Jan05Mar05May05Jul05Month/YearChartsSame DayBack LogBefore ASEMR
0100200300400500600700Sept05Nov 05 Jan 06 Mar 06Month/YearChartsSame DayBack LogAfter ASEMR
• Error Prevention• Patient care• Insurance• Decision Support• Patient satisfaction• Reimbursement• Efficiency/Time• Real-...
ASEMR DemoActive Semantic Electronic Medical Record, ISWC 2006
Semantics and Services enabledProblem Solving Environment forT.cruzi - SPSE• Majority of experimental data reside in labs•...
SPSE• Data Sources• Internal Lab Data• External Database• OntologicalInfrastructure• Parasite Lifecycle• ParasiteExperimen...
• Integrated internal data with external databases, such asKEGG, GO, and some datasets on TriTrypDB• Developed semantic pr...
Complex queries can also include:- on-the-fly Web services execution to retrieve additional data- inference rules to make ...
• So many ontologies• Rich in number of concepts• Mostly concentrated on taxonomicalrelationships• Applications require do...
DataInformationKnowledgeKnowledge Enrichment from Data
IntellegOBackgroundknowledgeModified backgroundknowledgeEMRKnowledge Enrichment from DataData Driven Knowledge Acquisition...
Knowledge Enrichment from Dataatrial Fibrillationhypertensiondiabeteschest painweight gaindiscomfort in chestrash skincoug...
DomainsCardiologyOrthopedicsOncologyNeurologyEtc…No of concepts 1008161Problems(diseases, symptoms) 125778Procedures 26236...
• 80% unstructured healthcare data• Pose challenges in• Searching• Understanding• Mining• Knowledge discovery• Decision su...
Coding Complexity ICD-9 ICD-10Diagnostic Codes 14,000 69,000Procedure Codes 3,800 72,000ICD-9(Current)ICD-10 Conversion(1s...
• Traditional methods doesn’t work• Understanding the context is crucialNeed to Do BetterHealthcare Challenge
Search MiningDecision SupportKnowledge Discovery Evidence-based MedicineNLP+SemanticsHealthcare Challenge – The Solution
ezHealthcTAKESezNLPezKB<problem value="Asthma" cui="C0004096"/><med value="Losartan" code="52175:RXNORM" /><med value="Spi...
ezHealth - Benefits• Advance search• All hypertension patients with ejectionfraction <40• All MI patients who are taking e...
Error DetectionEMR:1. “Sepsis due to urinary tract infection….”2. “Her prognosis is poor both short term and long term, ho...
EMR: ”The patient is to receive 2 fluid boluses."SNM:32457005_body_fluidA syntax based NLP extractor(such as Medlee) can e...
The balance of evidence would suggestthat his episode of atrial fibrillation seemsto be an isolated eventHe has had no doc...
She denies any chest pain but is not reallyfunction due to leg stiffness, swelling anshortness of breathRegarding the shor...
PREscription Drug abuse OnlineSurveillance and Epidemiology -PREDOSE• Non-medical use of Prescription Drugs• Fastest Growi...
Specific AimsDescribe drug user’s knowledge, attitudes,and behaviors related to illicit use ofPrescription Drugs (Content ...
Overall Approach1. Automate Data Collection• Social Media - Online Web forums2. Create Structured Domain Vocabulary• Drug ...
WebCrawlerInformal Text DatabaseWeb Forums2458Data CleaningStage 1. Data Collection3Stage 2. Automatic CodingStage 3. Data...
Research HighlightsDrug Abuse Ontology• First ontology on prescription drug abuseOntology-based Entity Identification• Gol...
Research HighlightsTemplate-based (Knowledge-Aware) Search• Complex Information NeedsSolution1. Ontology-based Search2. Ru...
Entity+ve SentimentOpiated EffectExtra-medical Use of LoperamideLoperamide-Withdrawal Discovery
kHealth71Health information is now available from multiple sources• medical records• background knowledge• social networks...
72Foursquare is an online application whichintegrates a persons physical location andsocial network.Community of enthusias...
73Sensors, actuators, and mobile computing are playing anincreasingly important role in providing data for early phases of...
74kHealth
75Personal Health DashboardkHealth
76Personal Health Dashboard1  2  3Continuous Monitoring Personal Assessment Medical ServiceAuxiliary Information – backg...
77?kHealth
kHealth – Key Ingredients78Background KnowledgeSocial Network InputPersonal ObservationsPersonal Medical History
79AbstractionsObservationskHealth
80kHealth - Technologyobservesinheres inperceivessendsfocussendsobservationObserver QualityEntityPerceiver
82kHealth - TechnologyBackgroundKnowledge asBi-partite Graph
83kHealth - TechnologyExplanation: is the act of choosing the objects or events that bestaccount for a set of observations...
84kHealth - TechnologyExplanatory Feature: a feature that explains the set ofobserved propertiesExplanatoryFeature ≡ ∃ssn:...
85kHealth - TechnologyDiscriminationExpected Property: would be explained by every explanatoryfeatureExpectedProperty ≡ ∃s...
86kHealth - TechnologyDiscriminationNot Applicable Property: would not be explained by anyexplanatory featureNotApplicable...
87kHealth - TechnologyDiscriminationDiscriminating Property: is neither expected nor not-applicableDiscriminatingProperty ...
90kHealth DemoAn Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web, Applied Ontology ...
91kHealth
92kHealth - Asthma• Can we detect asthma/allergy early?– Using data from on-body sensors, and environmental sensors– Using...
• GO (well controlled)– peak flow 80-100%*– Good breathing and sleep: Acceleration reading pattern– No cough: microphone– ...
94Physical Socialhttp://ngs.ics.uci.edu/blog/?p=1478CyberData CollectionAnalysisActionTake Medication before going to work...
95PersonalLevel EventsPopulation LevelEvents(Personal Level Events)(Personalized Events) (Population Level Events)Populati...
96Community SpacesPersonal SpacesPersonalWheeze – YesDo you have tightness of chest? –YesObservationsPhysical-Cyber-Social...
• Collaborators: AHC (Dr. Agrawal), CITAR-WSU, ezDI(ezdi.us), NLM (Dr. Bodenrider), CTEGD-UGA (Dr.Mnning/Prof. Tarleton), ...
Thank YouVisit Us @www.knoesis.orgwith additional background at http://knoesis.org/amit/hcls
Ohio Center of Excellence in Knowledge-enabled Computing -An Ohio Center of Excellence in BioHealth InnovationWright State...
Amit Sheth’sPHD studentsAshutosh JadhavHemantPurohitVinhNguyenLu ChenPavanKapanipathiPramodAnantharamSujanPereraAlan Smith...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research and clinical practices
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research and clinical practices
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research and clinical practices
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research and clinical practices
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research and clinical practices
Upcoming SlideShare
Loading in …5
×

Semantic Web & Web 3.0 empowering real world outcomes in biomedical research and clinical practices

1,296 views
1,239 views

Published on

Talk presented in Spain (WiMS 2013/UAM-Madrid, UMA-Malaga), June 2013.

Replaces earlier version at: http://www.slideshare.net/apsheth/semantic-technology-empowering-real-world-outcomes-in-biomedical-research-and-clinical-practices

Biomedical and translational research as well as clinical practice are increasingly data driven. Activities routinely involve large number of devices, data and people, resulting in the challenges associated with volume, velocity (change), variety (heterogeneity) and veracity (provenance, quality). Equally important is to realize the challenge of serving the needs of broader ecosystems of people and organizations, extending traditional stakeholders like drug makers, clinicians and policy makers, to increasingly technology savvy and information empowered patients. We believe that semantics is becoming centerpiece of informatics solutions that convert data into meaningful, contextually relevant information and insights that lead to optimal decisions for translational research and 360 degree health, fitness and well-being.
In this talk, I will provide a series of snapshots of efforts in which semantic approach and technology is the key enabler. I will emphasize real-world and in-use projects, technologies and systems, involving significant collaborations between my team and biomedical researchers or practicing clinicians. Examples include:
• Active Semantic Electronic Medical Record
• Semantics and Services enabled Problem Solving Environment for T.cruzi (SPSE)
• Data Mining of Cardiology data
• Semantic Search, Browsing and Literature Based Discovery
• PREscription Drug abuse Online Surveillance and Epidemiology (PREDOSE)
• kHealth: development of a knowledge-enhanced sensing and mobile computing applications (using low cost sensors and smartphone), along with ability to convert low level observations into clinically relevant abstractions

Further details are at http://knoesis.org/amit/hcls

Published in: Education, Health & Medicine
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,296
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Heterogeneity of data to be integrated(Variety)
  • QualityHow do you fix it? Measure it?How do you decide
  • Consumers are changedClinicians + drug makers + Insurance companiesTechnology savvy users + gadgetsPut the text from 360
  • We have lot of data, we are trying to use meaningfully, but still customer(users) are not satisfiedSo we need computer to understand the data
  • What is semantic web?http://en.wikipedia.org/wiki/Semantic_WebVast – huge dataVague – define ‘young’ ‘tall’Uncertainty - a patient might present a set of symptoms which correspond to a number of different distinct diagnoses each with a different probabilityDeceit -  intentionally misleading
  • The technology stack and usage of most popular technologies
  • Kno.e.sis products
  • This slide intend justify the development of tools doozer, scooner, Kino, iExplorerHuge amount of knowledge in different format and people are overloaded withKnowledge/Information, we need mechanism to better exploration of knowledgeAnd help them to find what they require(scooner, iExplorer) and derive new knowledge
  • Why doozer?Knowledge is available in various formats, but they are hardly helpful if not inStructured format. But building structured knowledgebase from available formats is achallenge
  • Human knowledge cycleDoozer is a one tool that supports this
  • Forms of open knowledgeWikipediaLODFormal models
  • Knowledge acquisition through Model creation
  • Hierarchy creation from wikipedia
  • Big picture
  • Doozer’s way of identifying relationships
  • Last two steps of knowledge cycle
  • Big pictureKno.e.sis: NLP based triples -  CarticRamakrishnan&apos;s and Pablo&apos;s work on open Information Extraction from biomedical text.Sentences in MedLine abstracts are parsed and split into Subject, Predicate and Object.In the Merge phase, only those triples that have Subject and Object that can be mapped to the initial KB are added to the enriched KB.BKR triples is that the BKR triples were probably verified by NLM before being published, whereas the Knoesis triples went into the KB unverified, apart from having to match initial KB concepts.
  • Last two steps of knowledge cycle
  • Why scooner
  • demo
  • Semantic annotation maps target data resources to concepts in ontologies.Extra information is added to the resource to connect it to its corresponding concept(s) in the ontology.This system includes two main components, a browser-based annotation front-end, integrated with NCBO and an annotation-aware back-end index that provides faceted search capabilities
  • illustrates the user interface of the annotator plug-in. When the user highlights and right clicks in a word or a phrase, the browser’s context menu includes the annotatation as a phylogenetical concept menu item. Selecting this menu item brings up the annotations window where the highlighted term is searched using the NCBO RESTful API and a detailed view of the available ontological terms is shown to the user to select. The user can search or browse for a concept in any ontology hosted in NCBO. Once all the annotations are added, users can directly submit the annotations to a predefined (configurable through an options dialog) Kino instance, by selecting the publish annotations menu item [3]. Kino supports generic domain annotations, and is capable of providing facets on any domain. Kino is built on top of Apache SOLR6, a facet capable indexing and searching engine that is easily extensible. The current Kino framework supports three facets based on the SA-REST specification. The index manages content of each annotation, the annotated text and the content of the document, hence the users have to flexibility to search on the annotated concept as well as the document content similar to a text based search engine.
  • Novelty of this annotation process: annotate the term in XML with the triples from ontology not just the concepts from ontology.3 kind of Object values in annotation:Literal object valueRemote resource as an object valueA nested annotation as an object valueThis is the first effort Step 1: tree1(s) is- close-match (p) tree (o)Step 2: tree (s) is-inferred –by (p) maximum-liklihood (o)
  • User can search for specific term and get all the annotate documents with that specific term.
  • Knowledge and data are separatedThere is no way to validate whether my data adheres to knowledge and vice-versa
  • Architecture
  • Generate Novel hypothesis
  • The challengeWhy ASEMR?
  • How ASEMR?
  • How ASEMR?
  • The architecture
  • Why SPSE?Integration of data gives more insights, but the heterogeneity of data stand against the integration
  • How SPSE
  • Benefits
  • why
  • EMR documents not only contain data/information but knowledge tooBut scattered nature of knowledge makes it difficult to discover
  • The big pictureThe built knowledgebase should be able to explain the real world data,We used this claim in reverse order: real world data can be used to enhance the Knowledge base when it fails to explain the dataScenario: Extract all diseases from the documentGenerate all possible symptoms for these diseases using knowledgebaseExtract all symptoms from the documentIf there are more symptoms in document than the generated set, this indicates that we might be missingsome relationship betweenDisease and symptomsWe use this indication to generate questions that can be answered by the domain expert, this will allow us to enrich the knowledgebase
  • From EMR: we extract the diseases and symptoms (we have already annotated concept in the EMR with our background knowledge)We generate the symptom coverage for the diseases found(union of symptoms that each disease attached to in the knowledge base)Now we have observed symptoms and all possible symptomsAssumption : observed symptoms should be a subset of all possible symptomsWhenever we found that there is a symptom in observed list which is not in all possible list, we can generate the hypothesis and verify with the domain expert.What we found is edema is symptom of hypertension.This method will reduce the workload of domain expertImagine we have 50 diseases and 100 symptomsThen there are 5000 possibilities,Domain expert has to go through each and validate, but with this methodWe will only ask the question only if we find evidence
  • What we achieved?Not sure whether this slide is requiredWe used lot of existing knowledgebases to build this knowledgebaseWe extract the knowledge from the listed websites by crawling and the annotating the concepts using UMLS
  • Unstructured data posing challenges in every field, but here is our attempt to overcomeThe challenge in healthcareTraditional methods - IR, Data mining, traditional NLP
  • People waiting to harness the unstructured healthcare data for all these applications
  • ArchitectureData Cleaning: Adding section headers, Modify malformed section headersDe-identificationCAC – Computer Assisted CodingCDI – Clinical Document Improvement
  • Emphasize the capability of inferencing (only because we have knowledgebase) andPoint out that how difficult to formulate such queries if knowledgebase is not available
  • EMR doc has these two sentences‘Urinary tract infection’ (first sentence) is correctly annotated, but ‘infection’ in second one is not.Second ‘infection’ actually refers to ‘urinary tract infection’ in first sentence, but NLP engineDoes not understand this.We could find this because there are no evidences to suggest ‘infection’ in the document according to our knowledgebase.So after detecting this issue, we could annotate the second infection as urinary tract infection(this annotation is done manually) Detection is done with IntellegOOne could rather argue that annotating second ‘infection’ as just infection does not harm because urinary tract infection is alsoInfection, but detection of these things help to improve the annotation.
  • NLP engine annotate the fluid as ‘body_fluid’ which is a symptomBut here the term ‘fluid’ does not refer to symptom rather the form of medication ‘boluses’We could find this issue because there was no disease in the document to suggest the ‘body fluid’
  • In this case NLP does not detect second statement is talking about history.But with the knowledgebase we have, we can say patient actually has AF.So we resolve the inconsistency here.Example from document 673
  • NLP does not understand the first sentenceIt attaches ‘not’ to shortness of breath which is wrong according to semantics of the sentence.But we can resolve this issue by using knowledgebaseExample from document 595
  • Why PREDOSE?Data collection practices – interactive interviews, surveysData analysis limitations- coding* www.judiciary.senate.gov/hearings/hearing.cfm?id=e655f9e2809e5476862f735da16cf3a9
  • Specific Aim 1: Has a special focus on recently approved abuse-deterrent formulation; It is now more difficult to obtain prescribed Oxycontin;
  • 1. Non-medical use of prescription drugs is fastest growing form of drug abuse in USEpidemic: Responding to America’s Prescription Drug Abuse CrisisGATEWAY DRUGNational Survey on Drug Use and Health (NSDUH) - nearly one-third of people aged 12 and over who used drugs for the first time in 2009 began by using a prescription drug non-medically2. Purdue Pharma - Best known for its pain-treatment products, OxyContin - Oxycontin reformulation (Aug 2010)Pathway to heroin addiction [1] (2003)[1] Probable Relationship Between Opioid Abuse and Heroin Use - ROBERT G. CARLSON, PH.D.The Ohio Substance Abuse Monitoring (OSAM) Network (In Dayton, 10 subjects, aged 18 to 33 years, were interviewed.)Accidental drug overdose death [2] (2008)[2] Recent changes in drug poisoning mortality in the United States by urban-rural status and by drug type. Paulozzi LJ, Xi Y. @ CDC 2008 - 1999-2004, degree of urbanization, found opiod abuse
  • Buprenorphine is an opioid antagonist used in the management of opioid addiction, including such opioids as heroin, oxycontin and vicodin, Prescribed daily dosage typically range from 4–24mg
  • Slang, abbreviations, equivalent concepts
  • Posts which mentioned Buprenorphine, benzodiazepine
  • Multi model healthcare data
  • Recent advancement in observation mechanisms and data sharing
  • Sensors play key role
  • But still we are here
  • We need to get here
  • Kno.e.siskHealth ideaOngoing work : simulating first two phasesOur product is MobileMDDemo is at the end of the slides
  • The ChallengeWe have sensors to measure movements, heart rate, sleeping, galvanic skin response etc…But we don’t know how to aggregate
  • Key ingredients which will help to understand the healthcare data(measurements)
  • Numbers-&gt;abstractions-&gt;knowledge integration(static knowledge about the domain, personal background)-&gt;predictionAdvantages: early detection and alert generation
  • http://www.vitalograph.com/products/monitors-screeners/asthma/asma-1-bluetooth
  • Observe data from different sensors at the same time.
  • System Architecture Fig. shows an overview of the SemHealth architecture. SensorsAll are bluetooth sensors already utilized by the current k-Health application to measure weight, heart rate, and blood pressureAndroid applicationReads sensor observations through bluetoothPerforms annotation on observations and generates percepts from those observationsUploads annotated observations and percepts to the server-side data storeRetrieves data using DSU API and feeds data to DPU and/or DVU APIsVisualizes data through DVU APIConsidered a “nice to have” as existing visualization may be used as-isWill utilize existing graphing library for Android with Open mHealth-style API that may be translated to browser at a later timeServer-sideOpen mHealth compliant DSU and DPU APIsTriple data storage replaces existing SQLite database in k-Health applicationExisting k-Health reasoner now the brains behind DPU
  • Example of queries: Give a time during the last 5 days when Blood pressure and heart rate were high for selected patient.Give me a last time any patient exhibited pre-hypertension.Give me a patient who exhibited reading of pre-hypertention and normal heart rate along with most recent timestamps for both reading.
  • Semantic Web & Web 3.0 empowering real world outcomes in biomedical research and clinical practices

    1. 1. 1Semantic Web & Web 3.0 empowering realworld outcomes in biomedical research andclinical practicesAmit ShethKno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohiohttp://knoesis.orghttp://knoesis.org/amit/hclsSpecial thanks: Sujan Perera; Ack: Kno.e.sis HCLS team and collaboratorsTalk presented in Spain (WiMS 2013/UAM-Madrid, UMA-Malaga), June 2013
    2. 2. Integration
    3. 3. Semantics
    4. 4. Role of Semantic Web in HCLS• Improve the machine understandability andprocessing of all types of data by• Modeling and Background Knowledge• Annotation• Complex Querying/Analysis, Reasoning• Improve Insight from Biomedical Data• Improve Clinical Decision Making• Vastness/Volume• Velocity• Variety/Heterogeneity• Vagueness, Uncertainty, Inconsistency, DeceitObjectiveChallengesApproach
    5. 5. Identifiers: URI Character set: UNICODESyntax: XMLData interchange: RDFQuerying:SPARQLTaxonomies: RDFSOntologies:OWLRules:RIF/SWRLUnifying logicProofTrustCryptographyUser interface and applicationsQueryingData/KnowledgeRepresentationKnowledgeRepresentationLots of need for NLP, ML, IR, and other technologies –SW significantly empowers these and closes some critical gaps
    6. 6. HCLS Apps @ Kno.e.sis• Semantic Search andBrowsing(Doozer++,SCOONER, iExplore)• Semantics and Servicesenabled Problem SolvingEnvironment forT.cruzi(SPSE)• Active Semantic ElectronicMedical Record(ASEMR)• Mining and Analysis ofEMR(ezFIND, ezMeasure,ezCAC)• kHealth (ADHF, Asthma, …)• PREscription Drug abuseOnline Surveillance andEpidemiology(PREDOSE)BiomedicalHealthcareEpidemiology
    7. 7. InsightsBetterUnderstandingIntuitiveBrowsingHypothesisGenerationPersonalizationKnowledgeExplorationDoozer++iExploreSCOONERKinoKno.e.sis Bioinformatics toolkithttp://knoesis.org/opensourcehttp://knoesis.org/showcase
    8. 8. Knowledge Acquisition – Doozer++• Building ontology is costly• Large volume of knowledge available in semi-structured/unstructured format• No assurance for the credibility of suchknowledge
    9. 9. Knowledge Acquisition – Doozer++Circle of Knowledgehttp://knoesis.org/node/71
    10. 10. Knowledge Acquisition – Doozer++
    11. 11. Knowledge Acquisition – Doozer++
    12. 12. j.1:category_sciencej.1:category_neurosciencej.1:category_cognitive_sciencej.1:category_psychologyj.1:category_behaviorj.1:category_philosophy_of_mindj.1:category_brainj.1:category_psycholinguisticsj.1:category_neurologyj.1:category_neurophysiology10 classes…Knowledge Acquisition – Doozer++
    13. 13. Doozer++ DemoKnowledge Acquisition from Community-Generated ContentContinuous Semantics to Analyze Real-Time Data , IEEE Internet Computing (Volume 14)
    14. 14. • Identify Relationships• Textual pattern-based extraction for knownrelationships• Facts available in background knowledge• Find evidence for such facts• Combined evidence from many differentpatterns increases the certainty of arelationship between the entitiesBeyond Hierarchy
    15. 15. • Evaluating acquired knowledge• Explicit• User can vote for facts• Facts presented based on user interests• Implicit• User’s browsing history used as a indication ofwhich propositions are correct and interesting• Now it adds validated knowledge back to communityValidating Knowledge
    16. 16. Base Hierarchy fromWikipediaSenseLab NeuroscienceOntologiesMeta KnowledgebasePubMed AbstractsFocused patternbased extractionInitial KB creationEnrichedKnowledgebaseHPCKeywordsKno.e.sis: NLPbased triplesNLM: Rule basedBKR triplesBuilding Human Performance &Cognition Ontology (HPCO)Mergehttp://wiki.knoesis.org/index.php/Human_Performance_and_Cognition_Ontology
    17. 17. Use Case for HPCO• Number of Entities – 2 million• Number of non-trivial facts – 3 million• NLP Based*: calcium-binding protein S100Bmodulates long-term synaptic plasticity• Pattern Based**: Olfactory Bulb has physicalpart of anatomic structure Mitral cell* Joint Extraction of Compound Entities and Relationships from Biomedical Literature , Web Intel. 2008* A Framework for Schema-Driven Relationship Discovery from Unstructured Text, ISWC 2006** On Demand Creation of Focused Domain Models using Top-down and Bottom-up Information Extraction, TechnicalReport
    18. 18. Knowledge-based Browsing - SCOONER• Knowledge-based browsing: relations window,inverse relations, creating trails• Persistent Projects: Work bench, Browsinghistory, Comments, Filtering• Collaboration: Comments, Dashboard, Exportingprojects, Importing projects
    19. 19. SCOONER DemoSCOONER DetailsAn Up-to-date Knowledge-Based Literature Search and Exploration Framework for FocusedBioscience Domains , IHI 2012- 2nd ACM SIGHIT International Health Informatics Symposium
    20. 20. Kino• An integrated suite of tools that enablesscientists to annotate– unstructured resources– semi-structured resources• Annotates documents by accessing NCBOontologies, via the NCBO Web API.• Includes two main components– A browser-based annotation front-end– An annotation-aware back-end index that providesfaceted search capabilities
    21. 21. Kino Architecture
    22. 22. Example: Annotating Literature
    23. 23. Annotation the XML file withNCBO OntologyrelrelOntologyconcept
    24. 24. Kino Search•Search the annotated document with the concept of interest•Return all annotated document with selected concept
    25. 25. Kino DemoKino: A Generic Document Management System for Biologists Using SA-REST and Faceted Search. ICSC 2011
    26. 26. iExploreInteractive Browsing and ExploringBiomedical Knowledge
    27. 27. Architecture
    28. 28. Generate Novel Hypothesis
    29. 29. iExplore videoiExplore Demo
    30. 30. Turning toApplications with End Users
    31. 31. Active Semantic Electronic MedicalRecord - ASEMR• New Drugs• Adds interaction with current drugs• Changes possible procedures to treat anillness• Insurance coverage changes• Will pay for drug X, but not Y• May need certain diagnosis beforeexpensive tests• Physicians are require to keep track of everchanging landscape
    32. 32. • A Document• With semantic annotations• entities linked to ontology• terms linked to specialized lexicon• With actionable information• rules over semantic annotations• rule violation indicated with alertsAtrial fibrillation with prior stroke, currentlyon Pradaxa, doing well.Mild glucose intolerance and hyperlipidemia,being treated by primary care.ASEMR – Active Semantic Document
    33. 33. • Type of ASD• Three Ontologies• PracticeInformation about practice such aspatient/physician data• DrugInformation about drugs, interaction,formularies, etc.• ICD/CPTDescribes the relationships between CPTand ICD codesASEMR – Active Semantic Patient Record
    34. 34. encounterancillaryeventinsurance_carrierinsurancefacilityinsurance_planpatientpersonpractitionerinsurance_policyowl:thingambularory_episodeASEMR – Practice Ontology Hierarchy
    35. 35. owl:thingprescription_drug_brand_namebrandname_undeclaredbrandname_compositeprescription_drugmonograph_ix_classcpnum_groupprescription_drug_propertyindication_propertyformulary_propertynon_drug_reactantinteraction_propertypropertyformularybrandname_individualinteraction_with_prescription_druginteractionindicationgeneric_individualprescription_drug_genericgeneric_compositeinteraction_with_non_drug_reactantinteraction_with_monograph_ix_classASEMR – Drug Ontology Hierarchy
    36. 36. ASEMR
    37. 37. 0100200300400500600Jan04Mar04May04Jul04Sept04Nov04Jan05Mar05May05Jul05Month/YearChartsSame DayBack LogBefore ASEMR
    38. 38. 0100200300400500600700Sept05Nov 05 Jan 06 Mar 06Month/YearChartsSame DayBack LogAfter ASEMR
    39. 39. • Error Prevention• Patient care• Insurance• Decision Support• Patient satisfaction• Reimbursement• Efficiency/Time• Real-time chart completion• “semantic” and automated linking withbillingASEMR - Benefits
    40. 40. ASEMR DemoActive Semantic Electronic Medical Record, ISWC 2006
    41. 41. Semantics and Services enabledProblem Solving Environment forT.cruzi - SPSE• Majority of experimental data reside in labs• Integration of lab data facilitate new insights• Formulating queries against such data requireddeep technical knowledgeA Semantic Problem Solving Environment for Integrative Parasite Research:Identification of Intervention Targets for Trypanosoma cruzi, 2012
    42. 42. SPSE• Data Sources• Internal Lab Data• External Database• OntologicalInfrastructure• Parasite Lifecycle• ParasiteExperiment• Query Processing• Cuebee
    43. 43. • Integrated internal data with external databases, such asKEGG, GO, and some datasets on TriTrypDB• Developed semantic provenance framework and influencedW3C community• SPSE supports complex biological queries that help findgene knockout, drug and/or vaccination targets. Forexample:• Show me proteins that are downregulated in the epimastigotestage and exist in a single metabolic pathway.• Give me the gene knockout summaries, both for plasmidconstruction and strain creation, for all gene knockout targets thatare 2-fold upregulated in amastigotes at the transcript level andthat have orthologs in Leishmania but not in Trypanosoma brucei.SPSE
    44. 44. Complex queries can also include:- on-the-fly Web services execution to retrieve additional data- inference rules to make implicit knowledge explicitSPSE
    45. 45. • So many ontologies• Rich in number of concepts• Mostly concentrated on taxonomicalrelationships• Applications require domain relationships• A is_symptom_of B• C is_treated_with DKnowledge Enrichment from Data
    46. 46. DataInformationKnowledgeKnowledge Enrichment from Data
    47. 47. IntellegOBackgroundknowledgeModified backgroundknowledgeEMRKnowledge Enrichment from DataData Driven Knowledge Acquisition Method for Domain Knowledge Enrichment in Healthcare, BIBM2012An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web,Applied Ontology 2011
    48. 48. Knowledge Enrichment from Dataatrial Fibrillationhypertensiondiabeteschest painweight gaindiscomfort in chestrash skincoughweight lossheadacheedemashortness of breathfatiguesyncopeweight losschest paindiscomfort in chestdizzyshortness of breathnauseavomitingheadachecoughweight gainDiseasesSymptomsSymptomsFrom EMR From KBIs edema symptom of atrial fibrillation?Is edema symptom of hypertension?Is edema symptom of diabetes?
    49. 49. DomainsCardiologyOrthopedicsOncologyNeurologyEtc…No of concepts 1008161Problems(diseases, symptoms) 125778Procedures 262360Medicines 298993Medical Devices 33124Relationships 77261is treated with (disease -> medication) 41182is relevant procedure (procedure -> disease) 3352is symptom of (symptom -> disease) 8299contraindicated drug (medication -> disease) 24428Knowledge Enrichment from Datawith the abovemethod+UMLShealthline.comdruglib.com
    50. 50. • 80% unstructured healthcare data• Pose challenges in• Searching• Understanding• Mining• Knowledge discovery• Decision support• Evidence based medicine• Federal policies promote meaningful use andpose constraints to healthcare systemHealthcare Challenge
    51. 51. Coding Complexity ICD-9 ICD-10Diagnostic Codes 14,000 69,000Procedure Codes 3,800 72,000ICD-9(Current)ICD-10 Conversion(1st Oct,2014)ClinicalDocumentation &Coding-BillingChallengesExample: 821.01: ICD-9 code for “closed” Fractured Femur, or thigh bone.Translates to 36 codes in ICD-10 with details regarding the precise nature offracture, which thigh was fractured, whether a delay in healing occurred etc.Healthcare Challenge
    52. 52. • Traditional methods doesn’t work• Understanding the context is crucialNeed to Do BetterHealthcare Challenge
    53. 53. Search MiningDecision SupportKnowledge Discovery Evidence-based MedicineNLP+SemanticsHealthcare Challenge – The Solution
    54. 54. ezHealthcTAKESezNLPezKB<problem value="Asthma" cui="C0004096"/><med value="Losartan" code="52175:RXNORM" /><med value="Spiriva" code="274535:RXNORM" /><procedure value="EKG" cui="C1623258" />ezFIND ezMeasure ezCDIezCACwww.ezdi.us
    55. 55. ezHealth - Benefits• Advance search• All hypertension patients with ejectionfraction <40• All MI patients who are taking either beta-blockers or ACE Inhibitors• Patients diagnose with Atrial Fibrillation onCoumadin or Lovanox• Support core-measure initiative
    56. 56. Error DetectionEMR:1. “Sepsis due to urinary tract infection….”2. “Her prognosis is poor both short term and long term, however, wewill do everything possible to keep her alive and battle this infection."SNM:40733004_infection SNM:68566005_infection_urinary_tractA syntax based NLP extractor(such as Medlee) can extractthis term and annotate asSNM:40733004_infectionBy utilizing IntellegO and cardiologybackground knowledge, we can moreaccurately annotate the term asSNM:68566005_infection_urinary_tract*MedLEEwith usage of IntellegOProblem Problem*MedLEE is NLP engine optimized to parse clinical documents
    57. 57. EMR: ”The patient is to receive 2 fluid boluses."SNM:32457005_body_fluidA syntax based NLP extractor(such as Medlee) can extractthis term and annotate asSNM:32457005_body_fluidMedLEEProblemFluid is part of buloses treatment, not a problemwith IntellegOBy utilizing IntellegO and cardiologybackground knowledge, we can determinethat this is not a symptom – henceannotation is incorrect.TreatmentError Detection
    58. 58. The balance of evidence would suggestthat his episode of atrial fibrillation seemsto be an isolated eventHe has had no documented atrialfibrillation since that timePatient has atrial fibrillationPatient does not have atrialfibrillationNLPNLPAtrial FibrillationSyncopeIs_symptom_ofWarfarinAtenololAspirinIs_medication_forResolve InconsistencyUsing domain relationships wevalidated that patient has atrialfibrillationSymptoms MedicationMedicationMedication
    59. 59. She denies any chest pain but is not reallyfunction due to leg stiffness, swelling anshortness of breathRegarding the shortness of breath, we willsend for a dobutamine stressechocardiogramPatient does not haveshortness of breathPatient has shortness of breathNLPNLPShortness of BreathIs_symptom_ofObesityHypertensionSleep ApneaObstructiveResolve InconsistencyUsing domain relationships wevalidated that patient hasshortness of breathDisorderDisorderDisorder
    60. 60. PREscription Drug abuse OnlineSurveillance and Epidemiology -PREDOSE• Non-medical use of Prescription Drugs• Fastest Growing Drug problem in US• Director ONDCP Gil Kerlikowske, Epidemic*• Pathway to heroin addiction• Escalating accidental overdose deaths• Current Epidemiological Data Systems• Interactive Interviews• Online Surveys• Manual Coding
    61. 61. Specific AimsDescribe drug user’s knowledge, attitudes,and behaviors related to illicit use ofPrescription Drugs (Content Analysis)Describe temporal patterns of non-medicaluse of Prescription Drugs(Trend Analysis)
    62. 62. Overall Approach1. Automate Data Collection• Social Media - Online Web forums2. Create Structured Domain Vocabulary• Drug Abuse Ontology (DAO)3. Automate Information Extraction• Entity, Relationship, Triple, Sentiment, Template4. Develop Tools for Data Analysisa) Content Analysis - Content Explorer, Template PatternExplorer, Proximity Searchb) Trend Analysis - Trend explorer, Emerging pattern explorer
    63. 63. WebCrawlerInformal Text DatabaseWeb Forums2458Data CleaningStage 1. Data Collection3Stage 2. Automatic CodingStage 3. Data Analysis and Interpretation16Qualitative and Quantitative Analysisof Drug User Knowledge, Attitudesand Behaviors+ =Semantic Web DatabaseInformation Extraction ModuleTemporal Analysis for Trend Detection10Triples/RDF DatabaseEntityIdentificationSentimentExtractionRelationshipExtractionTriple Extraction7Opioid, Cannabinoid,Side Effect, Feeling[Buprenorphine has_slang_term bupe][Suboxone subClassOf Buprenorphine][Suboxone_Injection CAUSES Nausea]Drug Abuse Ontology(Schema)9PREDOSE Web Application9
    64. 64. Research HighlightsDrug Abuse Ontology• First ontology on prescription drug abuseOntology-based Entity Identification• Gold standard dataset 601 posts• Buprenorphine – 33:1 Slang-to-drug mentions• Loperamide – 24:1 Slang-to-drug mentions:• 85% Precision, 72% Recall
    65. 65. Research HighlightsTemplate-based (Knowledge-Aware) Search• Complex Information NeedsSolution1. Ontology-based Search2. Rule-based Search – Intensity, Frequency, Dosage, Interval3. Context-Free Grammar – Queries Interpretable by PREDOSE4. Data Sources – Ontology, Lexicon, Lexico-ontology, Alphabet
    66. 66. Entity+ve SentimentOpiated EffectExtra-medical Use of LoperamideLoperamide-Withdrawal Discovery
    67. 67. kHealth71Health information is now available from multiple sources• medical records• background knowledge• social networks• personal observations• sensors• etc.
    68. 68. 72Foursquare is an online application whichintegrates a persons physical location andsocial network.Community of enthusiasts that share experiences ofself-tracking and measurement.FitBit Community allows theautomated collection andsharing of health-related data,goals, and achievementskHealth
    69. 69. 73Sensors, actuators, and mobile computing are playing anincreasingly important role in providing data for early phases ofthe health-care life-cycleThis represents a fundamental shift:• people are now empowered to monitor and manage their own health;• and doctors are given access to more data about their patientskHealth
    70. 70. 74kHealth
    71. 71. 75Personal Health DashboardkHealth
    72. 72. 76Personal Health Dashboard1  2  3Continuous Monitoring Personal Assessment Medical ServiceAuxiliary Information – background knowledge, social/community support,personal context, personal medical historykHealth
    73. 73. 77?kHealth
    74. 74. kHealth – Key Ingredients78Background KnowledgeSocial Network InputPersonal ObservationsPersonal Medical History
    75. 75. 79AbstractionsObservationskHealth
    76. 76. 80kHealth - Technologyobservesinheres inperceivessendsfocussendsobservationObserver QualityEntityPerceiver
    77. 77. 82kHealth - TechnologyBackgroundKnowledge asBi-partite Graph
    78. 78. 83kHealth - TechnologyExplanation: is the act of choosing the objects or events that bestaccount for a set of observations; often referred to as hypothesisbuildingDiscrimination: is the act of finding those properties that, ifobserved, would help distinguish between multiple explanatoryfeatures
    79. 79. 84kHealth - TechnologyExplanatory Feature: a feature that explains the set ofobserved propertiesExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn}elevated blood pressureclammy skinpalpitationsHypertensionHyperthyroidismPulmonary EdemaObserved Property Explanatory FeatureExplanation
    80. 80. 85kHealth - TechnologyDiscriminationExpected Property: would be explained by every explanatoryfeatureExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ∃ssn:isPropertyOf.{fn}elevated blood pressureclammy skinpalpitationsHypertensionHyperthyroidismPulmonary EdemaExpected Property Explanatory Feature
    81. 81. 86kHealth - TechnologyDiscriminationNot Applicable Property: would not be explained by anyexplanatory featureNotApplicableProperty ≡ ¬∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ¬∃ssn:isPropertyOf.{fn}elevated blood pressureclammy skinpalpitationsHypertensionHyperthyroidismPulmonary EdemaNot Applicable Property Explanatory Feature
    82. 82. 87kHealth - TechnologyDiscriminationDiscriminating Property: is neither expected nor not-applicableDiscriminatingProperty ≡ ¬ExpectedProperty ⊓ ¬NotApplicablePropertyelevated blood pressureclammy skinpalpitationsHypertensionHyperthyroidismPulmonary EdemaDiscriminating Property Explanatory Feature
    83. 83. 90kHealth DemoAn Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web, Applied Ontology 2011Representation of Parsimonious Covering Theory in OWL-DL (OWLED 2011)An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices (ISWC 2012)Data Driven Knowledge Acquisition Method for Domain Knowledge Enrichment in Healthcare, BIBM 2012
    84. 84. 91kHealth
    85. 85. 92kHealth - Asthma• Can we detect asthma/allergy early?– Using data from on-body sensors, and environmental sensors– Using knowledge from an asthma ontology, generated from asthma knowledge onthe Web and domain experts– Generate a risk measure from collected data and background knowledge• Can we characterize asthma/allergy progression?– State of asthma patient may change over time– Identifying risky progressions before worsening of the patient state• Does the early detection of asthma/allergy, andsubsequent intervention/treatment, lead toimproved outcomes?– Improved outcomes could be improved health (less serious symptoms), less needfor invasive treatments, preventive measures (e.g. avoiding risky environmentalconditions), less cost, etc.
    86. 86. • GO (well controlled)– peak flow 80-100%*– Good breathing and sleep: Acceleration reading pattern– No cough: microphone– Good physical activity: Acceleration reading pattern• CAUTION (not well controlled)– peak flow 60-80%*– Cough and Wheeze: microphone– Tight chest: Acceleration readings– Wakes up at night: Acceleration reading pattern• STOP (poor control)– peak flow < 60%*– Medicine not helping: medicine = TRUE and still in STOP state– Breathing hard and fast: microphone– Can’t walk or talk well: Acceleration and microphone93* Measured using peak flow meterAsthma Control Level andCorresponding Sensor Observations
    87. 87. 94Physical Socialhttp://ngs.ics.uci.edu/blog/?p=1478CyberData CollectionAnalysisActionTake Medication before going to workAvoid going out in the evening due to high pollen levelsDomain ExpertsDomain KnowledgeRisk ModelAction ModelOverall Landscape
    88. 88. 95PersonalLevel EventsPopulation LevelEvents(Personal Level Events)(Personalized Events) (Population Level Events)Population-levelEvents Relevant atthe Personal-levelMachine sensors: Pollen levels Pollution levels Accelerometer Peak flow meter Medication trackingPersonal sensors: Symptoms(kHealth) (EventShop)Qualify & Quantify-Detect all the factorsinfluencing asthma-Find the role of eachfactor in influencingasthmaAsthma Risk Profile-Contextual informationto personalize risk-Risk score computationAsthma Mitigation-Corrective action basedon risk scoreWhat are the factors influencing my asthma?What is the contribution of each of these factors?How controlled is my asthma? (risk score)What will be my action plan to manage asthma?StoragePose QuestionsReceive answersAccess/update patientinformationMachine sensors: Pollen levels Pollution levelsPersonal sensors: Symptoms Asthma prevalence
    89. 89. 96Community SpacesPersonal SpacesPersonalWheeze – YesDo you have tightness of chest? –YesObservationsPhysical-Cyber-Social System Health Signal Extraction Health Signal Understanding<Wheezing=Yes, time, location><ChectTightness=Yes, time, location><PollenLevel=Medium, time, location><Pollution=Yes, time, location><Activity=High, time, location>WheezingChectTightnessPollenLevelPollutionActivityWheezingChectTightnessPollenLevelPollutionActivityRiskCategory<PollenLevel, ChectTightness, Pollution,Activity, Wheezing, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory><2, 1, 1,3, 1, RiskCategory>...Actionable InformationAction: contact doctor nowExplanation: Increased activity is the primary cause of wheezing and high risk categoryExpert KnowledgeBackground Knowledgetweet reporting pollution leveland asthma attacksAcceleration readings fromon-phone sensorsSensor and personal observations Signals from personal, personalspaces, and community spacesRisk Category assigned by doctorsQualifyQuantifyEnrichOutdoor pollen and pollution
    90. 90. • Collaborators: AHC (Dr. Agrawal), CITAR-WSU, ezDI(ezdi.us), NLM (Dr. Bodenrider), CTEGD-UGA (Dr.Mnning/Prof. Tarleton), NCBO - Stanford, WelcomeTrust, AFRL, Boonshoft Sch of Med – WSU (Dr. Forbis,…),• Funding: NIH (NHLBI R01: 1R01HL087795-01A1;NIDA: R21 DA030571 ), NSF, AFRL, Industry….Acknowledgements
    91. 91. Thank YouVisit Us @www.knoesis.orgwith additional background at http://knoesis.org/amit/hcls
    92. 92. Ohio Center of Excellence in Knowledge-enabled Computing -An Ohio Center of Excellence in BioHealth InnovationWright State University
    93. 93. Amit Sheth’sPHD studentsAshutosh JadhavHemantPurohitVinhNguyenLu ChenPavanKapanipathiPramodAnantharamSujanPereraAlan SmithPramod KoneruMaryam PanahiazarSarasi LalithsenaCory HensonKalpaGunaratnaDelroyCameronSanjayaWijeratneWenboWangKno.e.sis in 2012 = ~100 researchers (15 faculty, ~50 PhD students)

    ×