Big data healthcare


Published on

Krishnaprasad Thirunarayan and Amit Sheth: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications, In: Proceedings of AAAI 2013 Fall Symposium on Semantics for Big Data, Arlington, Virginia, November 15-17, 2013.

With the rapid proliferation of mobile phones, social media, and sensors, it is critical to collect and convert big data so generated into actionable information that is relevant for decision making. In this session, we explore challenges and approaches for synthesizing relevant background knowledge and inferences that can enable smart healthcare and ultimately benefit community at large.


1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • EVENT: Wright State Honors Institute Symposium “Visions of the Future” on Thursday, March 20, 2014. ABSTRACT:With the rapid proliferation of mobile phones, social media, and sensors, it is critical to collect and convert big data so generated into actionable information that is relevant for decision making. In this session, we explore challenges and approaches for synthesizing relevant background knowledge and inferences that can enable smart healthcare and ultimately benefit community at large.
  • EVENT: Wright State Honors Institute Symposium “Visions of the Future” on Thursday, March 20, 2014. ABSTRACT:With the rapid proliferation of mobile phones, social media, and sensors, it is critical to collect and convert big data so generated into actionable information that is relevant for decision making. In this session, we explore challenges and approaches for synthesizing relevant background knowledgeand inferences that can enable smart healthcare and ultimately benefit community at large.-----------------------Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications Big Data Research: Sensor, Social, and Cyber-Physical Systems-----Our research thro the lens of big data.
  • Statistics in terms of the number of people effected and costs involved Heterogeniety: Sensor data, social media data, text documents / forum posts, Semi-structured Electronic Medical RecordsIBM Vision: Machine-sensed data to human action by distilling the data into nuggets of actionable information and progressively improving decision making by learningNature of computational problems to be addressedOur technical work : Web 3.0
  • Population of US : 315 million GDP : $16 trillionObama legislation Affordable Care Act : Hospital will not be reimbursed by medicare/medicaid insurance if patient readmitted within 30 daysChronic condition – can we help reduce preventable readmissions?CHF: Congestive Heart Failure
  • Can we determine cause/potential triggers, predict asthma exacerbation to avoid, treat, or control symptoms.chronic obstructive pulmonary disease (COPD)
  • Awareness important because it impacts overall healthQuantified Self
  • Quality of life
  • Larry Smarr is a professor at the University of California, San DiegoAnd he diagnosed himself with Crohn’s DiseaseHe is a pioneer in the area of Quantified-Self, which uses sensors to monitor physiological symptomsThrough this self-tracking process he discovered inflammation, which led him to discovery of Crohn’sDisease
  • EMR: capture information exchanged during Doctor’s visit and tests data : disease/symptom/prescribed medications/suggested regimen(PHR: Personal Health Record)Social media engagement : self-reported data from public at large----------------------------Huge amount of raw data generated by continuous monitoring => (what we are lacking is) actionable nuggets of information for decision making (treatment/control/avoidance/change in lifestyle)-----------Quantified SelfMonitoring for disease diagnosis, severity, and progression-------Semantics-based approaches needed to deal with variety or to transcend abstraction levels--------
  • ---------------------discovering “unexpected” correlations, and then seeking a transparent basis for them, seems worthy of pursuit. For instance, consider the controversies surrounding assertions such as ‘smoking causes cancer’, ‘high debt causes low growth’, ‘low growth causes high debt’, and ‘religious fanaticism breeds terrorists’.
  • Jeopardy : WATSON beat out (crème de la crème) human competitorsBig Data growth is accelerating as more of the world's activity is expressed digitally.Process and make sense of it, and enhance and extend the expertise of humans. ----------------- engine light signals/alerts : on detecting -> anomaly / problem => for further analysis / action--------
  • Size, rate of flow/accumulation and change, (syntactic and semantic) heterogeneity, trustworthiness/quality (signal to noise ratio), end-use (nuggets of wisdom)(develop techniques to harness data to derive value for decision making in the presence of these challenges)
  • What does semantic perception entail?Making sense of large amounts of low level data and communicating it in a meaningful waye.g. Ranges, aggregate/statistical measures ---------------------Semantic Perception: Converting Sensory Observations to Abstractions Using perception cycle and domain models: derive explanation, determine focus to disambiguate and discriminate for taking actionsHybrid reasoning: interleaved abductive and deductive components[**complex domain models reflecting comorbidities : high-fidelity models**] [**Gleaning Patterns from data**] [**Personalization**]
  • Saffir Simpson Hurricane Wind ScaleHurricane/Typoon/Cyclone(5 catergories) / Tropical storm / Tropical depression vs TsunamiNational Oceanic and Atmospheric Administration (NOAA)
  • ---------------------------ParkinsonMild(person) = Tremor(person) ∧ PoorBalance(person)ParkinsonModerate(person) = MoveSlow(person) ∧ PoorSleep(person) ∧ MonotoneSpeech(person)ParkinsonAdvanced(person) = Fall(person)----------------------------Loss of speech / food intake impossible / lack of balance => is there value in continuous monitoring? => Signatures for proactive control?----------------------------Dataset Characteristics: 8 weeks of data from 5 sensors on a smart phone, collected for 12 patients resulting in ~150 GB (with lot of missing data).--------------------------Control group vs PD patients distinguished on the basis of restricted motion, monotone speech, etc.
  • Main idea: Prior knowledge of PD was used to facilitate its detection from massive sensor data by reducing the search spaceDetails:Declarative knowledge of PD includes PD severity and their symptoms as shown in the logical rule aboveEach PD severity level is a conjunction of a set of PD symptomsEach symptom was mapped to its manifestation in sensor observationsThe availability of declarative knowledge significantly improved the analytics by aiding feature selection processThe graphs above contrasts the physical movements and voice of two control group members and two PD patients
  •  congestive heart failure / acute decompensated heart failure-- weight change due to water retention-------------------------------------------- cardiologist evaluate risk based on periodic monitoring data (+ human sensed health info inputs)--------------------------------------------Reduce preventable readmissions: 25% patients readmitted 30 day after discharge 50% patients readmitted within 6 months-------------------
  • EVIDENCE-BASED Approach to diagnosis, treatment and control (IRB)Environmental: CO, CO2, NO, pollen counts, mold, dust, smoke, humidity, temperature, pressure, etc. (sensordrone, dust –smoke sensor, air quality egg)Physiological: Wheezometer (breathing), heart rate, etc25 million people in the U.S. are diagnosed with asthma (7 million are children).300 million people suffering from asthma worldwide.Asthma related healthcare costs alone are around $50 billion a year.155,000 hospital admissions and 593,000 emergency department visits in 2006.
  • Volume: (1) semantic perception (2) parallelism
  • An Efficient Bit Vector Approach to Semantics-Based Machine Perception in Resource-Constrained Devices.Resources: memory, cpu, power, …Healthcare use-case – privacy, mobility, cheap onboard sensors, personalization, power, convenience-considerations dominateAbstracting and summarizing multimodal machine sensed observations + human observations for actionable and human accessible situational awareness and decision making---------Characteristics of a big data problem: size of the data exceeds the resources available/needed to compute
  • perception cycle contains interleaved iterative execution of two primary phasesExplanation (abductive)translating low-level signals into high-level abstractions inference to the best explanationDiscrimination (declarative)focusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)-----------------------Ask human relevant questions
  • perception cycle contains two primary phasesexplanationtranslating low-level signals into high-level abstractions inference to the best explanationdiscriminationfocusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
  • Observe units on x and y axis : small vs large problem size; small vs large amount time Step as opposed to linear which reflects allocation in quantum of 1 word (32 or 64 bits)---------Size of the graph is plotted in terms of number of nodes as we hold one of feature/property fixedOtherwise, the size of the graph is o(n^2)
  • Research on Asthma has three phases Data collection: what signals to collect?Analysis: what analysis to be done?Actionable information: what action to recommend?In the next slide, we take a peek into the analysis that we do for Asthma
  • Syntactic : different data formatsSemantic :Conceptual modelsSemantic : multimodal sensing + different conceptual models--------------Complementary and corroborative information => complete and reliable/robust;---------------------------“Semantics Empowered Web 3.0” book
  • Semantics at different levels of detail and developed in stages : ---------------------Ease of use by domain expertsFaster and wider adoption, promoting evolutionLow upfront cost to supportShallow semantics has wider applicability to a range of documents/data and appeal to a broader communityBottom-line: “Learn to Walk before we Run”------------------------------------------------------Controlled vocabularies <= Lightweight ontologies [ legacy vocab + community agreed semantic relationships] <= Formal ontologiesOriginal document vs its translation => traceability (provenance)---------Past Research: We have dealt with top-down UMLS ontology vs bottom-up facts from Pubmed in HPCO (Literature-based discovery -> LBD)-----------------------------RECALL: materials and process specs typically describe: composition, processing, testing, and packaging of materialFormalizing a procedure (a process or a test) as an aggregation of characteristic/parameter-value pairs = LOD  Eventually allows combining and comparing specs==============================Biomaterials use case: Gold surface affinity of peptide sequence
  • Semantic Perception and Hybrid KRR => Event, disease, human comprehensible features … (e.g., Parkinson, Asthma)--------------Slow traffic vs reason for it (accident vs tree fall): semantics to data : sensors monitoring traffic space-----------Cardiology use case – how a patient is feeling – giddy, depressed, etc.
  • Idea : Glean statistical correlations from data (PGM) and enrich/validate it using symbolic knowledge (manually curated) orient undirected links, delete conflicting links, + complement nodes and links Explicit declarative knowledge obviates the need to generate it, especially in the context of sparse/skewed data PLUS it will be relaible------------Structure learning uncovers qualitative conditional dependencies integrate with declarative information using progressively expressive graphical models : same abstraction levelParameter learning using refined structure to estimate better fitting model
  • Taxonomic : relating and organizing terms : nomenclature
  • e.g., tides and ebbs caused by the alignment of earth, sun and moon, around full moon and new moon; “anomalous” orbits of Solar system planets w.r.t. the “circular” motion of stars in geocentric theory (‘planet’ is ‘wanderer’ in Greek) explained by heliocentrism and theory of gravitation, (Copernicus) correlation of time period and distance of planets (Kepler)and the “anomalous” precision of Mercury’s orbit clarified by General Theory of Relativity; (Einstein) C-peptide protein can be used to estimate insulin produced by a patient’s pancreas => ANOMALY (Copernicus) and REGULARITY (Kepler) => CAUSE (Newton)=> (Newtonian Mechanics) => (General Theory of Relativity)Bold claims all the time in politicsBeer vs diaper; Walmart’s hurricanes vspoptarts ---------------------(4) Stress/spicy foods are correlated with peptic ulcers, but the latter are caused by Helicobacter Pyrolias demonstrated by Nobel Prize winning works of Marshall and Warren.ORIENTATION UNCLEAR: ‘high debt causes low growth’, ‘low growth causes high debt’, ------------------(5) Since the 1950s, both the atmospheric Carbon Dioxide level and obesity levels have increased sharply. (6) Pavlovian learning induced conditional reflex, and some of the financial market moves, seem to be classic cases of correlation turning into causation! ---------PARADOXES : THE SEEDS OF PROGRESSZeno’s paradox, Hydrostatic paradox, light speed constant in all reference frames, CBR, Expanding universe, …
  • complementary and corroboratory
  • EMR
  • determined that people who searched for both drugs during the 12-month period were significantly more likely to search for terms related to hyperglycemia than were those who searched for just one of the drugs. They also found that people who did the searches for symptoms relating to both drugs were likely to do the searches in a short time period: 30 percent did the search on the same day, 40 percent during the same week and 50 percent during the same month.
  • Semantic Perception : Hybrid Abductive/Deductive Reasoning (Volume)Cost-benefit trade-off and Continuum of Semantic models to manage Heterogeneity (Variety)Hybrid Knowledge Representation and Reasoning : Probabilisitc + Logical : structure + parameter estimation (Variety)
  • Big data healthcare

    1. 1. 1 1 T. K. Prasad (Krishnaprasad Thirunarayan ) Professor of Computer Science and Engineering Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435 Big Data and Smart Healthcare Honors Institute Symposium on Visions of the Future
    2. 2. Big Data Processing and Smart Healthcare Krishnaprasad Thirunarayan (T. K. Prasad) Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing
    3. 3. Outline • Extent and Economics of Healthcare Problem • Nature of Health-related Big Data • Cognitive Computing Goals • Five V’s of Big Data Research • Our Research – Semantic Perception for Scalability – Lightweight Semantics to Manage Heterogeneity – Hybrid Knowledge Representation and Reasoning • Anomaly, Correlation, Causation 03/20/2014 Prasad 3
    4. 4. Acute Decompensated Heart Failure (ADHF) Statistics • Heart failure affects > 5 million people in the US. • > 550,000 new cases are diagnosed each year. • The estimated cost of heart failure in the US for 2008 is $34.8 billion. • Approximately 25% of patients are re-hospitalized within 30 days of discharge. • Approximately 50% of patients are re-hospitalized within 6 months of discharge. 03/20/2014 Prasad 4
    5. 5. Asthma Statistics • Asthma affects > 25 million people in the US. • > 7 million are children. • The current reactive cost > $56 billion. • Asthma is the third leading cause of hospitalization with 800,000 emergency room visits among children under the age of 15. 03/20/2014 Prasad 5
    6. 6. Obesity Statistics 03/20/2014 Prasad 6 • The number of severely obese (BMI ≥ 40) patients has quadrupled between 1986 and 2000 from one in 200 to one in 50. • Obesity-related medical treatment costs > $150 billion a year. • Hospitalizations of children and youths with obesity doubled from 1999 to 2005.
    7. 7. Parkinson’s Disease (PD) Statistics 03/20/2014 Prasad 7 • In 2010, 630,000 people in the US had a diagnosis of PD. • The number of people with PD will double by 2040. • Just medical costs for people with PD is $8.1 billion total.
    8. 8. The Patient of the Future MIT Technology Review, 2012 8
    9. 9. Healthcare Related Big Data for Potential Exploitation: Assorted Examples • Sensor data: M. J. Fox Foundation Parkinson disease challenge • Other Applications: The healthcare industry spends roughly $250 billion per year due to fraud. 03/20/2014 Prasad 9
    10. 10. Structured vs Unstructured Data Patient Disorders ICD-9 Code Patient1 Hypertension 401 Patient2 Atrial fibrillation 427.31 Patient1 Pulmonary hypertension 416 Patient3 Edema 782.3 Patient4 hyperthyroidism 242.9 Coronary artery disease, status post four-vessel coronary artery bypass graft surgery on , by Dr. X with a left internal mammary artery to the left anterior descending artery, sequential vein graft to the ramus and first diagonal, and a vein graft to the posterior descending artery. He had normal left ventricular function. He is having some symptoms that are unclear if they are angina or not. I am therefore going to get him scheduled for an exercise Cardiolite stress test. VS
    11. 11. Patient Data Distribution Structured data Unstructured data
    12. 12. Search Mining Decision Support Knowledge Discovery Prediction NLP + Semantics Nature of Processing
    13. 13. An Example He is off both Diovan and Lotrel. I am unsure if it is due to underlying renal insufficiency. He has actually been on atenolol alone for his hypertension. Raw Text Concepts Knowledge Inference diovan lotrel renal insufficiency atenolol hypertension diovanvaltuna valsartan antihypertensive agent atenolol tenominatenix kidney failure renal insufficiency kidney disease disorder blood pressure disorder hypertension systoloc hypertension pulmonary hypertension Patient taking diovan for hypertension Patient has kidney disease Patient is on antihypertensive drugs is used to treat is a drug disorder
    14. 14. Purpose of Big Data Analytics Vetted by Domain Experts Data can help compensate for our overconfidence in our own intuitions and reduce the extent to which our desires distort our perceptions. -- David Brooks of New York Times However, inferred correlations require clear justification that they are not coincidental, to inspire confidence. 03/20/2014 Prasad 14
    15. 15. Cognitive Computing Systems 03/20/2014 Prasad 15 • Leverage Big Data using human experts to enable better decisions. – Process natural language and unstructured data. – Use of Artificial Intelligence (e.g., Machine Learning algorithms) to sense, infer, predict, abduce, and, in some ways, think. Check engine light analogy
    16. 16. Research Challenges : 5V’s of Big Data Volume Velocity Variety Veracity Value Big Data => Smart Data 03/20/2014 Prasad 16
    17. 17. Volume : (1) Semantic Perception Semantic Perception : Volume => Value Distill voluminous machine-sensed data into human comprehensible nuggets necessary for decision-making using background knowledge 03/20/2014 Prasad 17
    18. 18. Parkinson’s Disease Use Case 03/20/2014 Prasad 20
    19. 19. Heart Failure Use Case 03/20/2014 Prasad 22
    20. 20. Asthma Use Case 03/20/2014 Prasad 23
    21. 21. Volume : (2) Exploiting Embarrassing Parallelism 03/20/2014 Prasad 24
    22. 22. Volume with a Twist Resource-constrained reasoning on mobile-devices 03/20/2014 Prasad 25
    23. 23. Cory Henson’s Thesis Statement Machine perception can be formalized using semantic web technologies to derive abstractions from sensor data using background knowledge on the Web, and efficiently executed on resource- constrained devices. 03/20/2014 Prasad 26
    24. 24. * based on Neisser’s cognitive model of perception Observe Property Perceive Feature Explanation Discrimination 1 2 Perception Cycle* that exploits background knowledge / domain models Abstracting raw data for human comprehension Focus generation for disambiguation and action (incl. human in the loop) Prior Knowledge 2703/20/2014 Prasad
    25. 25. O(n3) < x < O(n4) O(n) Efficiency Improvement • Problem size increased from 10’s to 1000’s of nodes • Time reduced from minutes to milliseconds • Complexity reduced from polynomial to linear Evaluation on a mobile device Prasad 35
    26. 26. 36 kHealth: Health Signal Processing Architecture Take Medication before going to work Avoid going out in the evening due to high pollen levels Domain ExpertsDomain Knowledge Risk Model Data Acquisition & aggregation Analysis Personalized Actionable Information Personal level Signals Public level Signals Population level Signals Events from Social Streams Contact doctor
    27. 27. kHealth Demo • kHealth: 38
    28. 28. Variety Syntactic and semantic heterogeneity • in textual and sensor data, • in social media and Web forums data • In Electronic Medical Records 03/20/2014 Prasad 39
    29. 29. Variety (How?): (1) Granularity of Semantics & Applications • Lightweight semantics: File and document-level annotation to enable discovery and sharing • Richer semantics: Data-level annotation and extraction for semantic search and summarization • Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Data Cost-benefit trade-off and continuum 03/20/2014 Prasad 40
    30. 30. Variety (How?): (2) Hybrid KRR Blending data-driven models with declarative knowledge – Data-gleaned models: Bottom-up, correlation- based, statistical – Expert-given KBs: Top- down, causal/taxonomical, logical – Refine structure to better estimate parameters E.g., Medical Data Analytics using PGMs + KBs 03/20/2014 Prasad 42
    31. 31. Veracity Scalable and Agile Big Data Analytics cannot deliver value unless we have confidence and trust in our data. Open Problem: Develop expressive frameworks for trust to make explicit all aspects that go into trust formation and inferences. 03/20/2014 Prasad 45
    32. 32. Veracity: Confession of sorts! Trust is well-known, but is not well-understood. The utility of a notion testifies not to its clarity but rather to the philosophical importance of clarifying it. -- Nelson Goodman (Fact, Fiction and Forecast, 1955) 03/20/2014 Prasad 46
    33. 33. (More on) Value Discovering gaps and enriching domain models using data E.g., Semantics Driven Approach for Knowledge Acquisition from EMRs 03/20/2014 Prasad 47
    34. 34. (More on) Value Discovering drug-drug interaction by analyzing search query logs • E.g., The antidepressant, paroxetine, and the cholesterol lowering drug, pravastatin, were shown to interfere causing high blood sugar, by correlated searches with “hyperglycemia”, “high blood sugar” or “blurry vision”. 03/20/2014 Prasad 48
    35. 35. Conclusions • Glimpse of our research organized around the 5 V’s of Big Data • Discussed role in harnessing Value – Semantic Perception (Volume) – Continuum of Semantic models to manage Heterogeneity (Variety) – Hybrid KRR: Probabilistic + Logical (Variety) – Trust Models (Veracity) 03/20/2014 Prasad 49
    36. 36. thank you, and please visit us at Department of Computer Science and Engineering Wright State University, Dayton, Ohio, USA Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing Special Thanks to: Pramod Anantharam, Sujan Perera, Dr. Cory Henson, Professor Amit Sheth 03/20/2014 Prasad 50