Healthcare innovations at Kno.e.sis


Published on

This talk, given to the executive committee of the Boonsoft School of Medicines summarizes/introduces some of the projects on clinical and healthcare applications, and health informatics including consumer health behavior and social media use in healthcare. I focus on personalized digital health, handling/mining of healthcare big data, high-level description of innovations and especially applications involving clinical partners that empower patients, support better clinical decision making, reduce clinician's information overload, or improve clinical outcomes. [Because some of the evaluations are undergoing now, some of these benefits are yet to be quantitatively and qualitatively assessed.]

2 min. video on a Personalized Digital Health application (Asthma control in Children):

Also see: for related information.

1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Starting slide

    Various Big data problems – Traditional examples vs what we are doing examples.
    Variety and Velocity than Volume. kHealth problem. People will be interested in Smart Data.
    Traditional ML techniques, High Performance Computing, Statistics. Human level of Abstraction is Smart data.
  • Larry Smarr is a professor at the University of California, San Diego
    And he was diagnosed with Crones Disease
    What’s interesting about this case is that Larry diagnosed himself
    He is a pioneer in the area of Quantified-Self, which uses sensors to monitor physiological symptoms
    Through this process he discovered inflammation, which led him to discovery of Crones Disease
    This type of self-tracking is becoming more and more common
  • compute machine perception inferences -- i.e., explanation and discrimination -- of high-complexity on a resource-constrained devices in miliseconds

    Difference between the other systems and what this system provides
  • Intelligence at the age. Shipping computation and domain models to the edge (Distributed)
  • For every 1 death from prescription drug overdose there are:
    10 users admitted for treatment
    32 users admitted to the emergency department
    130 people who are users/dependent
    825 non-medical users of prescription drugs

    White House Office of National Drug Control Policy (ONDCP) launched Epidemic (May 24, 2011)
  • Epidemiologist’s Approach
    Data collection from interviews, surveys
    Content Analysis using Coding

    Computer Scientists’ Approach
    Automate Data Collection
    Multiple sources of rich data
    Automate Content Analysis
    Information Extraction
    Trend Analysis
  • Sample post from a user that was just discharged from rehab facility. Sent home with Suboxone and Phenobarbital treatment drugs
    Phenobarbital - an anti-anxiety and anticonvulsant barbiturate, used to treat anxiety and seizures

    This post contains entities, which require structured representations to resolve.

    We created the Drug Abuse Ontology (DAO) first ontology for prescription drug abuse.

    The ontology is very important because of the pervasive use of slang.
    In a manually created gold standard set of 601 posts the following was observed:
    33:1 Buprenorphine
    24:1 Loperamide
  • INTENSITY – more than, abnormal, in excess of, too much
    DRUG-FORM – ointment, tablet, pill, film
    INTERVAL – for several years
  • Loperamide is sold over the counter (OTC) in Imodium

    Yellow – positive sentiments
    Pink – Entities

    Green – curious finding - indication of getting high in the process

    Mention the practice of Megadosing!!
  • Background knowledge is used to explain the patient notes.
    The explain means each symptom should be explained by at least one disorder in the documents
    If there is at least one symptom which is not explained, then we generate hypothesis based on this observation.
    Initially all the disorder in the document becomes candidates
    By we developed a filtering mechanism to filter out hypothesis with low confidence
    We generate hypothesis with high confidence
  • More at:

  • Healthcare innovations at Kno.e.sis

    1. 1. Put Knoesis Banner Healthcare Innovations at Kno.e.sis Presentation to the Boonshoft School of Medicine Executive Committee, July 10, 2014 Amit Sheth Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) Wright State University, USA
    2. 2. • Among top universities in the world in World Wide Web (cf: 10-yr impact, Microsoft Academic Search: among top 10 in June2014) • Largest academic group in the US in Semantic Web + Social/Sensor Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical & Biomedicine Applications • Exceptional student success: internships and jobs at top salary (IBM Watson/Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research universities, NLM, startups ) • 100 researchers including 15 World Class faculty (>3K citations/faculty) and ~45 PhD students- practically all funded • Extensive research for largely multidisciplinary projects; world class resources; industry sponsorships/collaborations (Google, IBM, …) 2
    3. 3. Kno.e.sis in 2014 = ~100 researchers (15 faculty, ~50 PhD students) Amit Sheth’s PHD students Ashutosh Jadhav Hemant Purohit Vinh Nguyen Lu Chen Pavan Kapanipathi Sujan Perera Pramod Anantharam Alan Smith Swapnil Soni Maryam Panahiazar Sarasi Lalithsena Shreyansh Batt Kalpa Gunaratna Delroy Cameron Sanjaya Wijeratne Wenbo Wang 3 Special thanks Special thanks Special thanks Special thanks Special thanks: This presentation covers some of the work of these researchers.
    4. 4. • 80% of doctors will eventually become obsolete: Vinod Khosla, VC and founder of Sun Microsystems • “The Doctor is (Always) In: Reinventing the Doctor- Patient Relationship for the 21st Century” [Dr. J. Shlain]. More data is generated under patient control and outside clinical system. Patient empowerment, reimbursement changes and AHA. • #dHealth and #IoT are two hottest hashtags at CES and SXSW 4 Healthcare is changing way too fast
    5. 5. The Patient of the Future MIT Technology Review, 2012 5
    6. 6. 6 Collaborators
    7. 7. 7 Healthcare Innovation at Kno.e.sis (with subset of applications)
    8. 8. 8 kHealth: Knowledge empowered personalized digital mhealth With applications to: ADHF, GI, Asthma, [Geriatrics] Contact: Prof. Amit Sheth
    9. 9. Brief Introduction Video
    10. 10. 10 Data Overload for Patients/health aficionados Providing actionable information in a timely manner is crucial to avoid information overload or fatigue Sleep data Community data Personal Schedule Activity data Personal health records
    11. 11. ‘FOR human’: Improving Human Experience Weather Application 11 Weather Application Asthma Healthcare Application Action in the Physical World Personal Detection of events, such as wheezing sound, indoor temperature, humidity, dust, and CO2 level Close the window at home during day to avoid CO2 inflow, to avoid asthma attacks at night Public Health Population Level
    12. 12. 12 Making sense of sensor data with
    13. 13. Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information canary in a coal mine knowledge-enabled healthcare 13 kHealth
    14. 14. 14 kHealth to Manage ADHF (Acute Decompensated Heart Failure)
    15. 15. 15 25 million 300 million $50 billion 155,000 593,000 Asthma People in the U.S. are diagnosed with asthma (7 million are children)1. People suffering from asthma worldwide2. Spent on asthma alone in a year2 Hospital admissions in 20063 Emergency department visits in 20063 1 2 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.
    16. 16. WHY Big Data to Smart Data? Healthcare example Asthma is a multifactorial disease with health signals spanning personal, public health, and population levels. Understanding relationships between health signals and asthma attacks for providing actionable information 16 Value Can we detect the asthma severity level? Can we characterize asthma control level? What risk factors influence asthma control? What is the contribution of each risk factor? semantics Velocity Veracity Variety Volume Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies.
    17. 17. Sensors and their observations Asthma Control => Daily Medication Choices for starting therapy Not Well Controlled Poor Controlled Severity Level of Asthma (Recommended Action) (Recommended Action) (Recommended Action) for understanding asthma Intermittent Asthma SABA prn - - Mild Persistent Asthma Low dose ICS Medium ICS Medium ICS Moderate Persistent Asthma Medium dose ICS alone Or with LABA/montelukast Medium ICS + LABA/Montelukast Or High dose ICS Medium ICS + LABA/Montelukast Or High dose ICS* Severe Persistent Asthma High dose ICS with LABA/montelukast Needs specialist care Needs specialist care ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist Asthma Control and Actionable Information 17 Personal, Public Health, and Population Level Signals for Monitoring Asthma
    18. 18. 18 Personal Health Score and Vulnerability Score At Discharge Health Score Non-compliance Poor economic status No living assistance Vulnerability Score Well Controlled Low Well Controlled Very low Not Well Controlled High Not Well Controlled Medium Poor Controlled Very High Poor Controlled High Estimation of readmission vulnerability based on the personal health score
    19. 19. 19 Health Signal Extraction to Understanding Physical-Cyber-Social System Observations Health Signal Extraction Health Signal Understanding Personal Population Level Acceleration readings from on-phone sensors Wheeze – Yes Do you have tightness of chest? –Yes Risk Category assigned by doctors <Wheezing=Yes, time, location> <ChectTightness=Yes, time, location> <PollenLevel=Medium, time, location> <Pollution=Yes, time, location> <Activity=High, time, location> PollenLevel Wheezing ChectTightness Pollution Activity PollenLevel Wheezing ChectTightness Pollution Activity RiskCategory <PollenLevel, ChectTightness, Pollution, Activity, Wheezing, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> . . . Background Knowledge Expert Knowledge Sensor and personal observations tweet reporting pollution level and asthma attacks Signals from personal, personal spaces, and community spaces Qualify Quantify Enrich Outdoor pollen and pollution Public Health Well Controlled - continue Not Well Controlled – contact nurse Poor Controlled – contact doctor
    20. 20. 20 Health Signal Extraction Challenges Social streams has been used to extract many near real-time events Twitter provides access to rich signals but is noisy, informal, uncontrolled capitalization, redundant, and lacks context We formalize the event extraction from tweets as a sequence labeling problem Now you know why you’re miserable! Very High Alert for B-ALLERGEN Ragweed I-ALLERGEN pollen. B-FACILITY Oklahoma I-FACILITY Allergy I-FACILITY Clinic says it’s an extreme exposure situation How do we know the event phrases and who creates the training set? (manual creation is ruled out) Idea: Background knowledge used to create the training set e.g., typing information becomes the label for a concept
    21. 21. intelligence at the edge Approach 1: Send all sensor observations to the cloud for processing Approach 2: downscale semantic processing so that each device is capable of machine perception 21 Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
    22. 22. Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning 010110001101 0011110010101 1000110110110 101100011010 0111100101011 000110101100 0110100111 22 Efficient execution of machine perception
    23. 23. Efficiency Improvement • Problem size increased from 10’s to 1000’s of nodes • Time reduced from minutes to milliseconds • Complexity growth reduced from polynomial to linear O(n3) < x < O(n4) O(n) 23 Evaluation on a mobile device
    24. 24. 1 Translate low-level data to high-level knowledge Machine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making 2 Prior knowledge is the key to perception Using SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web 3 Intelligence at the edge By downscaling semantic inference, machine perception can execute efficiently on resource-constrained devices 24 Semantic Perception for smarter analysis: 3 ideas to takeaway
    25. 25. 25 PREDOSE: Social media analysis driven epidemiology Application: Prescription drug abuse and beyond Contact: Delroy Cameron
    26. 26. 26 PREDOSE: Prescription Drug abuse Online Surveillance and Epidemiology Bridging the gap between researcher and policy makers Early identification of emerging patterns and trends in abuse Kno.e.sis - Ohio Center of Excellence in Knowledge-enabled Computing CITAR - Center for Interventions Treatment and Addictions Research D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, R. Falck. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media. Journal of Biomedical Informatics. July 2013 (in press)
    27. 27. PREDOSE: Prescription Drug abuse Online Surveillance and Epidemiology In 2008, there were 14,800 prescription painkiller deaths* • Drug Overdose Problem in US • 100 people die everyday from drug overdoses • 36,000 drug overdose deaths in 2008 • Close to half were due to prescription drugs * Gil Kerlikowske Director, ONDCP Launched May 2011 27
    28. 28. PREDOSE: Bringing Epidemiologists and Computer Large Data Sample Sizes Access hard-to-reach Populations Early Identification and Detection of Trends Interviews Online Surveys Sample Biases Manual Effort Group Therapy: Automatic Data Collection Not Scalable Epidemiologist Qualitative Coding Problems Computer Scientist Automate Information Extraction & Content Analysis Scientist together 28
    29. 29. <1' ; #!A>!2' 1' !B+' )D3:3!' +C!*+1#&E&#1' @- +! ! ! !! 2&5; !B$53#![ +1- )- ; D!! <?N#. ' ]! ! ! !! ! ! !! =S! /#. E- &' )!B+' )D3:3!,- &!/&#+C!2#1#?@- +! *+,- &. ' @- +!Q01&' ?@- +!R - C5)#! " #$! %&' ( )#&! ! ! !! ! ! !! ^ 6 <#. ' +@?!" #$!2' 1' $' 3#! 7 8 *+,- " #$!4- &5. 3! &. ' )!/#01!2' 1' $' 3#! 9 2' 1' !%)#' +:+; ! <1' ; #!=>!2' 1' !%- ))#?@- +! A <1' ; #!6>!B51- . ' @?!%- C:+; ! = F G5' ):1' @H#!' +C!G5' +@1' @H#!B+' )D3:3! - ,!2&5; !I 3#&!J +- ( )#C; #K!BL 15C#3! ' +C!M#N' H:- &3! O! P! /&:E)#3TU24!2' 1' $' 3#! Q+@1D! *C#+@V?' @- +! <#+@. #+1! Q01&' ?@- +! U#)' @- +3N:E! Q01&' ?@- +! /&:E)#!Q01&' ?@- +! W ! "#$#%&'( ) **) +#*$#%&' , #%- '. / - 01&'2- - 3#*4' X! "#$%&' $#( )&%*N' 3Y3)' +; Y1#&. !+"#%Z! X, "+' - ' &%*35$%)' 33[ ,!! "#$%&' $#( )&%Z! X, "+' - ' &%. /&0%12' &*%BI <Q<!3 4"5%4Z! ^ _UQ2[ <Q!" #$!BEE):?' @- +! 29
    30. 30. Sentiment Extraction feel pretty damn good +ve feel great experience sucked -ve didn’t do shit bad headache Entity Identification Subutex Suboxone subClassOf Buprenorphine has_slang_term bupe subClassOf has_slang_term bupey I was sent home with 5 x 2 mg Suboxones. I also got a bunch of phenobarbital (I took all 180 mg and it didn't do shit except make me a walking zombie for 2 days). I waited 24 hours after my last 2 mg dose of Suboxone and tried injecting 4 mg of the bupe. It gave me a bad headache, for hours, and I almost vomited. I could feel the bupe working but overall the experience sucked. Of course, junkie that I am, I decided to repeat the experiment. Today, after waiting 48 hours after my last bunk 4 mg injection, I injected 2 mg. There wasn't really any rush to speak of, but after 5 minutes I started to feel pretty damn good. So I injected another 1 mg. That was about half an hour ago. I feel great now. Triples Codes Triples (subject-predicate-object) Suboxone used by injection, negative experience Suboxone injection-causes-Cephalalgia Suboxone used by injection, amount Suboxone injection-dosage amount-2mg Suboxone used by injection, positive experience Suboxone injection-has_side_effect-Euphoria DIVERSE DATA TYPES ENTITIES DOSAGE PRONOUN INTERVAL Route of Admin. RELATIONSHIPS SENTIMENTS Drug Abuse Ontology (DAO) 83 Classes 37 Properties 33:1 Buprenorphine 24:1 Loperamide 30
    31. 31. PREDOSE: Smarter Data through Shared Context and Ontology Lexicon Lexico-ontology Rule-based Grammar ENTITIES TRIPLES EMOTION INTENSITY PRONOUN SENTIMENT DRUG-FORM ROUTE OF ADM SIDEEFFECT DOSAGE FREQUENCY INTERVAL Suboxone, Kratom, Herion, Suboxone-CAUSE-Cephalalgia disgusted, amazed, irritated more than, a, few of I, me, mine, my Im glad, turn out bad, weird ointment, tablet, pill, film smoke, inject, snort, sniff Itching, blisters, flushing, shaking hands, difficulty breathing DOSAGE: <AMT><UNIT> (e.g. 5mg, 2-3 tabs) FREQ: <AMT><FREQ_IND><PERIOD> (e.g. 5 times a week) INTERVAL: <PERIOD_IND><PERIOD> (e.g. several years) Data Integration 31
    32. 32. PREDOSE: Role of Semantic Web and Ontologies Data Type Semantic Web Technique Limitations of Other Approaches Entity Ontology-driven Identification & Normalization ML/NLP IR Requires Labeled Data Unpredictable term frequencies Triple Schema-driven Difficult to develop language model Requires entity disambiguation Sentiment Ontology-assisted Target Entity Resolution Inconsistent data for Parse Trees or rules Diverse simple & complex slang terms & phrases 32
    33. 33. 33 with it, SOME of it has to make it through? Not sure.” “Normally around 100 milligrams of loperamide will get me out of withdrawals.” Loperamide-Withdrawal Discovery “Loperamide alone is enough to keep me well without being miserable, IF I megadose.” “This loperamide has saved my life during w/ds.... and made me even more careless with my monthly meds.” Loperamide is used to self-medicate to from Opioid Withdrawal symptoms “But I just wanted to tell you that loperamide WILL WORK. I take 105 mg of methadone/day, and recently have been running out early due to a renewed interest in IVing that shit. 200mg of lope 100 pills will make me almost 100 again. It brings the sickness down to the level of, say, a minor flu. Sleep returns, restlessness dissipates. Sometimes a mild opiation is felt.” dose of 16 mg per day. For example, web forum participants shared the following opinions: “Back in the day when I would run out of pills early I would take 8-10 Lopermide tabs and get some pretty good relief from w/d.” “So you just stick with it. Don’t go and score big with your next paycheck. Overcome the need to make everything numb. Learn to live with normality for a while. It’ll all seem worthwhile soon enough. Go for a walk. Get out of the house. Go grab some loperamide from the store, the desperate junky’s methadone.” “If you take a shitload of loperamide like 10-20 pills at once in withdrawal, you’ll get relief from some of the physical symptoms. Im not sure exactly how it works, but it’s definitely MORE than just relieving the GI symptoms. Im guessing if you just bombard your blood with it, SOME of it has to make it through? Not sure.” The most commonly discussed side effects of loperamide use were constipation, dehydration and other types of gastrointestinal discomforts. Some also reported mild withdrawal symptoms from using loperamide for an extended period of time. “Normally around 100 milligrams of loperamide will get me out of withdrawals.” “Loperamide is good for a day or two but the problem is on loperamide I lose all desire to eat OR drink, or do anything really.” “Loperamide alone is enough to keep me well without being miserable, IF I megadose.” “I used to sing the praises of loperamide....and still do, as a short term standby until you can score. Long term maintenance, it really wears you out. Starts to “feel” toxic though I doubt it actually is toxic... After a few days i would get severely dehydrated because it “This loperamide has saved my life during w/ds.... and made me even more careless with my monthly meds.”
    34. 34. 34 EMR and clinical text analysis: Intelligence from clinical data Contact: Sujan Parera
    35. 35. • Active Semantic EMR: high quality, low error, faster completion of patient records • Predicting patient outcomes and advice discharge decisions based on both structured (billing) data and clinical text (unstructured data) • Deep understanding of clinical text for Computer Assisted Coding for ICD9 and ICD10 and Computerized Document Improvement (commercial products from ezDI) 35
    36. 36. Explanation Module Explained? Yes No D D D Hypothesis Filtering Hypothesis Generation Hypothesis with High Confidence D D D Patient Notes UMLS Semantic Driven Approach for Knowledge Acquisition from EMRs
    37. 37. Semantics enhanced NLP Deep clinical text analysis using semantics enhanced NLP has enabled our industry partner ezDI to develop exciting commercial products: ezCDI (Computerized Document Improvement) and ezCAC (Computer Assisted ICD9/ICD10 Coding) See: 37
    38. 38. Semantics enhanced NLP • Typical NLP algorithms misclassify linguistic nuances • Document 1: • Coronary artery disease listed in the current diagnosis list • “Send for carotid duplex to rule out carotid artery stenosis given his risk factors and underlying coronary artery disease.“ (NLP output says patient does not have coronary artery disease) • Document 2: • “Extremities : Warm and dry. No clubbing or cyanosis. No lower extremity edema.“ • “I have advised the patient on the side effect of potential lower extremity edema.“ (NLP output says patient has lower extremity edema) • Document 3 • “He is not having any symptoms of chest pain or exertional syncope or dizziness.” • “I advised him that if he experiences chest pain, shortness of breath with exertion or dizziness or syncopal episodes to let us know and we can do appropriate workup.” (NLP output says patient has chest pain, shortness of breath, dizziness, syncopal) Green - correctly identified entities Red – misclassified entities 38
    39. 39. Semantics enhanced NLP • Domain knowledge can be used to resolve misclassifications Symptoms Medication Syncope Atrial Fibrillation Is_symptom_of Warfarin Medication Atenolol Medication Is_medication_for Aspirin • There are strong evidences to suggest that patient has Atrial Fibrillation. 39
    40. 40. Raw Text to Knowledge antihypertensive valsartan agent kidney disease renal insufficiency Patient taking atenolol for hypertension He is off both Diovan and Lotrel. I am unsure if it is due to underlying renal insufficiency. He has actually been on atenolol alone for his hypertension. Inference Knowledge Concepts Raw Text diovan lotrel renal insufficiency atenolol hypertension valtuna diovan atenolol atenix tenomin kidney failure disorder blood pressure disorder hypertension systoloc hypertension pulmonary hypertension Patient has kidney disease Patient is on antihypertensive drugs is used to treat is a drug disorder 40
    41. 41. ezHealth Platform ezCAC ezFIND ezMeasure ezCDI ezNLP cTAKES ezKB <problem value="Asthma" cui="C0004096"/> <med value="Losartan" code="52175:RXNORM" /> <med value="Spiriva" code="274535:RXNORM" /> <procedure value="EKG" cui="C1623258" /> 41
    42. 42. 42 Online Health Information Seeking Contact: Ashutosh Jadhav
    43. 43. Internet Users in the World Around 3 Billions (40%) of the world population Around 300 Million (87 %) of the US population 43
    44. 44. Online Health Information Seeking • Online health resources – Easily accessible – Helps to obtain medical information quickly, conveniently – Can help non-experts to make more informed decisions – Play a vital role in improving health literacy 44
    45. 45. Online Health Information Seeking • With the growing availability of online health resources, consumers are increasingly using the Internet to seek health related information According to a 2013 Pew Survey*, one in three American adults has gone online to find information about a specific medical condition. *Fox S, Duggan M. Pew Internet & American Life Project. 2013. Health online 2013 45
    46. 46. Online Health Information Seeking • One of the most common ways to seek online health Information is via Web search engines such as Google, Yahoo! and Bing According to the Pew Survey, approximately 8 in 10 online health inquiries initiate from a search engine. Fox S, Duggan M. Pew Internet & American Life Project. 2013. Health online 2013 46
    47. 47. Motivation • Analyzing health search log – Helps to understand population level health information needs – How users formulate search queries (“expression of information need”) – availability of potentially larger, cohorts of real users and their behaviors, e.g. querying behaviors • Such knowledge can be applied – to improve the health search experience – to develop next-generation knowledge and content delivery systems 47
    48. 48. Online Health Information Seeking Smart Devices Personal Computers vs. Jadhav A et al. “Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal” Journal of Medical Internet Research 2014;16(7):e160 (Impact factor 3.8)
    49. 49. Desktop Mobile Mobile usage takes Over Motivation
    50. 50. Motivation • With the recent exponential increase in usage of smart devices, the percentage of people using smart devices to search for health information is also growing rapidly
    51. 51. Motivation • Experience of online information searching varies depending on the device used – Smart devices (SDs) : mobile, tablets – Personal computers (PCs): desktop, laptop • PCs and SDs have distinct characteristics – Readability, user experience, accessibility, etc.
    52. 52. Study Objective • In order to improve the health information searching process and to be prepared for technology shift, it is necessary – to understand how device choice influences online health information seeking
    53. 53. • Data: Dataset Creation – Health search queries – lunched from PCs and SDs – submitted from Web search engines – and directed users to Mayo Clinic’s consumer health information portal ( • Data timeframe: – June 2011 to May 2013 • Data collection tool: – IBM NetInsight On Demand (Web Analytics tool) • Dataset size: – More than 100 million health search queries for both PCs and SDs
    54. 54. Comparative Data Analysis • For PCs and SDs, we analyzed and compared – Frequently searched health categories – Types of search queries (keyword-based, Wh-questions, Yes/No questions) – Structural properties of the queries • Length of the search queries • Usage of the search query operators • Usage of special characters – Misspellings in the health search queries – Linguistic characteristics of the queries
    55. 55. Intent Mining for Health Information Seeking  The most-searched health categories are ‘Symptoms’ (1 in 3 search queries), ‘Causes’ and ‘Treatments & Drugs’  One of the least searched health category is “Prevention”  The distribution of search queries for different health categories differ with the device used for search  Search queries from both PCs and SDs, follow similar pattern for distribution of the search queries between health categories
    56. 56. Intent Expression: Search Query Type  Health queries are predominately formulated using keywords (~85%); followed by Wh and Yes/No questions  Users ask more health questions from SDs compared to those from PCs  In the health search queries, users ask more  “what”, “how” questions => descriptive information need  “can”, “is” and “does” questions => factual information need
    57. 57. Intent Expression: Search Query Length  Average length of the queries from SDs (3.29 words and 18.86 characters) is bit longer than that of PCs (2.9 words and 17.61 chars)  Health queries tend to be longer than the general search queries indicating users interest in more specific information
    58. 58. Online Health Information Seeking for Cardiovascular Diseases Jadhav A et al."What Information about Cardiovascular Diseases do People Search Online?”, 25th European Medical Informatics Conference (MIE 2014), Istanbul, Turkey, August 31 - Sept 3, 2014. Jadhav A et al. "Online Information Searching for Cardiovascular Diseases: An Analysis of Mayo Clinic Search Query Logs” AMIA 2014 Annual Symposium, Washington DC, Nov 15-19, 2014
    59. 59. Motivation • According to CDC, in the United States – CVD is one of the most common chronic diseases – the leading cause of death (1 in every 4 deaths) • CVD is common across all socioeconomic groups and demographics • Most of the CVDs require lifelong care and the patient is in charge of managing the disease through self-care • Online health resources are “significant information supplement” for the patients with chronic conditions 59
    60. 60. Motivation • Although chronic diseases affect large population, very few prior studies have investigated online health information searching exclusively for chronic diseases and especially for CVD. • In this study, we address this knowledge gap in the community – by performing population-level intention mining for online health information seeking 60
    61. 61. • Data: Dataset Creation – CVD related search queries – submitted from Web search engines – and directed users to Mayo Clinic’s consumer health information portal ( • Data timeframe: – September 2011 to August 2013 • Data collection tool: – IBM NetInsight On Demand (Web Analytics tool) • Dataset size: – 10 million CVD related search queries, which is a significantly large dataset for a single class of diseases. 61
    62. 62. Research Problem • Identification of users intent for health information seeking • For example Search Query Health Category Heart palpitations with headache Symptoms Tylenol raise blood pressure Medication, Vital sign Pump for pulmonary hypertension Medical device, Disease Red wine heart disease Food, Disease Bypass surgery Treatment 62
    63. 63. Intent Mining for Online Health Information Seeking • Using background knowledge based to develop a rule based classification approach – Using UMLS MetaMap and based on UMLS concepts and semantic types – To categorize CVD search queries into 14 “consumer oriented” health categories – Precision: 88.42% , Recall: 86.07% and F-Score: 0.8723 63
    64. 64. Methods Overview 64
    65. 65. Intent Mining for Health Information Seeking: Association Rules for Categorization
    66. 66. Intent Mining for Health Information Seeking: Categorization Results • One in every two search is related to either ‘Diseases and Conditions’ or ‘Vital signs’. • Other popular health categories that users search for includes ‘Symptoms’, ‘Living with’, ‘Treatments’, ‘Food and Diet’ and ‘Causes’. • Although CVD can be prevented with some lifestyle and diet changes, interestingly very few OHISs search for CVD ‘Prevention’.
    67. 67. Intent Mining for Health Information Seeking: Categorization Results • A search query can be categorized into zero, one or more health categories • Using our categorization approach, we categorized 92% of the 10 million CVD related queries into at least one health category • Most of the queries (around 88%) are categorized into either one or two categories • Very few CVD queries (4.28%) are categorized into 3 or more categories.
    68. 68. Top CVD Search Queries • Most of the top search queries are related to major CVD diseases and conditions. • At the same time, queries about blood pressure (high/low) and heart rate also searched frequently
    69. 69. Intent Expression: Search Query Length • Average search query length for CVD is 3.88 words and 22.22 characters • Around 80% of the CVD search queries have 3 or more words. • The analysis implies that, CVD search queries are longer than previously reported non-medical as well as medical queries • Longer search queries also denote users’ interest in more specific information about the disease; subsequently users use more words to narrow down to a particular health topic.
    70. 70. Intent Expression: Search Query Types • Users predominantly formulate search queries using keywords (80%), though queries with Wh-Questions are also significant • Few queries (2.5%) are formulated as Yes/No type questions • In Wh-questions, OHISs mostly use “How” and “What” in the search queries and both of them generally signify that more descriptive information is needed • Yes/No questions are usually used to check some factual information. In Yes/No Questions, OHISs more often start the search queries with “does” “can” and “is”
    71. 71. Comparative Analysis of Online Health Information Seeking for Chronic Diseases Cardiovascular Diseases Arthritis Cancer Diabetes
    72. 72. Analyzing Temporal Patterns in Online Health Information Seeking
    73. 73. Analyzing online information seeking for “Food and Diet” in the context of “Health”
    74. 74. 74 Social Health Signals Contact: Ashutosh Jadhav
    75. 75. • Everyday millions of health related tweets shared • Most of these tweets are highly personal and contextual • Only around 12% posts are informative* • Keyword-based search doesn't help • User has to manually identify informative tweets How to automate the identification of informative content? 75 Problem: Identifying Signals from Noise
    76. 76. Present high quality, reliable and informative health related information shared over social media by understanding 76 Who who shared the information? social network user People Analysis share what what content is shared? social media post Content Analysis when when the post is generated? Temporal Analysis in what context what is the topic of the message? Semantic Analysis on which channel To which website, the social media post is pointing? Reliability Analysis with what social effect how many retweets, facebook like/share, comments for the post? Popularity Analysis Social Health Signals
    77. 77. 77 Search and Explore Social Health Signals Top health news Faceted search (by health topics)
    78. 78. 78 On going projects
    79. 79. • Stress, obesity/lifestyle disease, chronic diseases • Food and diet in the health context • Keeping elderly at home as long as possible • Clinical research – developing blood test for esophageal cancer detection 79 On the drawing board
    80. 80. • Kno.e.sis is a truly multidisciplinary, pan-University Center of Excellence were world class technology/computing expertise come together with clinical research and applications in health, fitness & wellbeing • Major theme: personalized digital health, patient empowerment, informed patients, epidemiology • More is covered in my talk on Semantic Data enabling Personalized Digital Health 80 Take Away
    81. 81. 81 Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA
    82. 82. 1. Henson C, Thirunarayan K, Sheth A. An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices 11th International Semantic Web Conference (ISWC 2012), Boston, Massachusetts, USA, November 11-15, 2012 2. Henson C, Sheth A, Thirunarayan K. Semantic Perception: Converting Sensory Observations to Abstractions IEEE Internet Computing, vol. 16, no. 2, pp. 26-34, Mar./Apr. 2012, doi:10.1109/MIC.2012.20 3. Henson C, Thirunarayan K, Sheth A. An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web. Applied Ontology, vol. 6(4), pp.345-376, 2011. 4. Perera S, Sheth A, Thirunarayan K, Nair S and Shah N. Challenges in Understanding Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can Help. International Workshop on Data management & Analytics for healthcaRE (DARE) at ACM Conference of Information and Knowledge Management (CIKM), pp. 21-26, Burlingame, USA, Nov 1, 2013, 5. Perera S, Henson C, Thirunarayan K, Sheth A, Nair S. Semantics Driven Approach for Knowledge Acquisition From EMRs. IEEE Journal of Biomedical and Health Informatics, vol.18, no.2, pp.515-524, March 2014, doi: 10.1109/JBHI.2013.2282125, PMID: 24058038 82 Selected References
    83. 83. 6. Cameron D, Smith GA, Daniulaityte R, Sheth A et al.PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media. Journal of Biomedical Informatics. 46(6): 985-997, 2013. PMID: 23892295 7. Cameron D, Bodenreider O, Yalamanchili H, Danh T et al. A Graph-Based Recovery and Decomposition of Swanson's Hypothesis using Semantic Predications. Journal of Biomedical Informatics 46(2): 238-251, 2013. 8. Jadhav A, Sheth A, Pathak J. Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer Health Information Portal. American Medical Informatics Association (AMIA) Annual Symposium 2014, Washington DC, November 15-19, 2014 9. Jadhav A, Andrews D, Fiksdal A, Kumbamu A, McCormick JB, et al. Comparative Analysis of Online Health Queries Originating From Personal Computers and Smart Devices on a Consumer Health Information Portal. J Med Internet Res 2014;16(7):e160, PMID: 25000537 10. Fiksdal A, Kumbamu A, Jadhav A, Nelsen L, Pathak J, McCormick JB. Evaluating the Process of Online Health Information Searching: A Qualitative Approach to Exploring Consumer Perspectives. in press at J Med Internet Res 2014 11. Jadhav A, Wu S, Sheth A, Pathak J. Online Information Seeking for Cardiovascular Diseases: A Case Study from Mayo Clinic. 25th European Medical Informatics Conference (MIE 2014), Istanbul, Turkey, August 31 - Sept 3, 2014 83 Selected References