Successfully reported this slideshow.

Smart Data for you and me: Personalized and Actionable Physical Cyber Social Big Data

2,685 views

Published on

Featured Keynote at Worldcomp'14, July 2014: http://www.world-academy-of-science.org/worldcomp14/ws/keynotes/keynote_sheth

Video of the talk at: http://youtu.be/2991W7OBLqU

Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is human health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information, etc.). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data!

In this talk, I will forward the concept of Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If I am an asthma patient, for all the data relevant to me with the four V-challenges, what I care about is simply, “How is my current health, and what is the risk of having an asthma attack in my personal situation, especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP.

For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations.

Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city. I will present examples from a couple of these.

Published in: Data & Analytics
  • Be the first to comment

Smart Data for you and me: Personalized and Actionable Physical Cyber Social Big Data

  1. 1. Smart Data for you and me: Personalized and Actionable Physical Cyber Social Big Data Put Knoesis Banner Keynote at WorldComp 2014, July 21, 2014 Amit Sheth LexisNexis Ohio Eminent Scholar & Exec. Director, The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) Wright State, USA
  2. 2. BIG Data 2014 2 http://hrboss.com/hiringboss/articles/big-data-infographic
  3. 3. Only 0.5% to 1% of the data is used for analysis. 3http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explode http://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
  4. 4. Variety – not just structure but modality: multimodal, multisensory Semi structured 4
  5. 5. Velocity Fast Data Rapid Changes Real-Time/Stream Analysis Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail 5
  6. 6. 6 What has changed now? About 2 billion of the 5+ billion have data connections – so they perform “citizen sensing”. And there are more devices connected to the Internet than the entire human population. These ~2 billion citizen sensors and 10 billion devices & objects connected to the Internet makes this an era of IoT (Internet of Things) and Internet of Everything (IoE). http://www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf
  7. 7. 7 “The next wave of dramatic Internet growth will come through the confluence of people, process, data, and things — the Internet of Everything (IoE).” - CISCO IBSG, 2013 http://www.cisco.com/web/about/ac79/docs/innov/IoE_Economy.pdf Beyond the IoE based infrastructure, it is the possibility of developing applications that spans Physical, Cyber and the Social Worlds that is very exciting. What has changed now?
  8. 8. 8 What has not changed? We need computational paradigms to tap into the rich pulse of the human populace, and utilize diverse data We are still working on the simpler representations of the real- world! Represent, capture, and compute with richer and fine- grained representations of real-world problems What should change?
  9. 9. 9 Current focus on Big Data is on meeting Enterprise/Company needs. Significant opportunity in applications for individual and community needs. Many of these, esp. in complex domains such as health, fitness and well-being; better disaster coordination, personalized smart energy These need to exploit diverse data types and sources: Physical(sensor/IoT), Cyber(Web) and Social data. Smart data –personalized, contextually relevant, actionable information – provide a better computational paradigm. My take on thinking beyond the Big Data buzz
  10. 10. • Not just data to information, not just analysis, but actionable information, delivering insight and support better decision making right in the context of human activities 10 What is needed? Data Information Actionable: An apple a day keeps the doctor away A blood test has ~30 bio markers…how will a doctor cope with a test with 300K data points?
  11. 11. 11 What is needed? Taking inspiration from cognitive models • Bottom up and top down cognitive processes: – Bottom up: find patterns, mine (ML, …) – Top down: Infusion of models and background knowledge (data + knowledge + reasoning) Left(plans)/Right(perceives) Brain Top(plans)/Bottom(perceives) Brain http://online.wsj.com/news/articles/SB10001424052702304410204579139423079198270
  12. 12. • Ambient processing as much as possible while enabling natural human involvement to guide the system 12 What is needed? Smart Refrigerator: Low on Apples Adapting the Plan: shopping for apples
  13. 13. Makes Sense to a human Is actionable – timely and better decisions/outcomes 13
  14. 14. 15 My 2004-2005 formulation of SMART DATA - Semagix Formulation of Smart Data strategy providing services for Search, Explore, Notify. “Use of Ontologies and Data repositories to gain relevant insights”
  15. 15. Smart Data (2014 retake) Smart data makes sense out of Big data It provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, in-turn providing actionable information and improve decision making. 16
  16. 16. OF human, BY human FOR human Smart data is about extracting value by improving human involvement in data creation, processing and consumption. It is about (improving) computing for human experience. Another perspective on Smart Data 17
  17. 17. Petabytes of Physical(sensory)-Cyber-Social Data everyday! More on PCS Computing: http://wiki.knoesis.org/index.php/PCS 18 ‘OF human’ : Relevant Real-time Data Streams for Human Experience
  18. 18. Use of Prior Human-created Knowledge Models 19 ‘BY human’: Involving Crowd Intelligence in data processing Crowdsourcing and Domain-expert guided Machine Learning Modeling
  19. 19. Detection of events, such as wheezing sound, indoor temperature, humidity, dust, and CO level Weather Application Asthma Healthcare Application Close the window at home during day to avoid CO in gush, to avoid asthma attacks at night 20 ‘FOR human’ : Improving Human Experience (Smart Health) Population Level Personal Public Health Action in the Physical World Luminosity CO level CO in gush during day time
  20. 20. Electricity usage over a day, device at work, power consumption, cost/kWh, heat index, relative humidity, and public events from social stream Weather Application Power Monitoring Application 21 ‘FOR human’ : Improving Human Experience (Smart Energy) Population Level Observations Personal Level Observations Action in the Physical World Washing and drying has resulted in significant cost since it was done during peak load period. Consider changing this time to night.
  21. 21. 22 Every one and everything has Big Data – It is Smart Data that matter!
  22. 22. 23 http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/ MIT Technology Review, 2012 The Patient of the Future
  23. 23. Physical-Cyber-Social Computing An early 21st century approach to Computing for Human Experience
  24. 24. PCS Computing People live in the physical world while interacting with the cyber and social worlds Physical World Cyber World Social World
  25. 25. 26 Computations leverage observations form sensors, knowledge and experiences from people to understand, correlate, and personalize solutions. Physical- Cyber Social-Cyber Physical-Cyber-Social What if?
  26. 26. Sensors around, on, and in humans will bridge the physical and cyber world. Cyber Physical We believe that current CPS should view the physical world by incorporate solutions form (knowledge) cyber world with a lens of social context. There are silos of knowledge on the cyber world which are under utilized. Social Social networks bridge the social interactions in the physical and cyber world. Mark’s discomfort sensed by: galvanic skin response, heart rate, fitbit, and Microsoft Kinect Physical Cyber Social Computing involves: (1) Comparing physiological observations from people similar to him (age, weight, lifestyle, ethnicity, etc.) (2) Analyzing health experiences of similar people reporting heartburn (3) Incorporating history of ailments of Mark (4) Leveraging medical domain knowledge of diseases and symptoms. •He is advised to visit a doctor since he had a heart condition (from EMR) in the past and heartburns in similar people (social) was a symptom of arterial blockage Mark is experiencing heartburn. Alert to contact his doctor. Physical Sensing Actuating Computing Rich knowledge of the medical domain EMR and PHR Physiological sensor data from human population Health related experiences shared by humans 27 PCS Computing: Health Scenario
  27. 27. 28 Vertical operators facilitate transcending from data- information-knowledge-wisdom using background knowledge Horizontal operators facilitate semantic integration of multimodal and multisensory observations PCS Computing PCS computing is a holistic treatment of data, information, and knowledge from physical, cyber, and social worlds to integrate, understand, correlate, and provide contextually relevant abstractions to humans. Think of PCS Computing as the application/semantic layer for the IoE-based infrastructure. http://wiki.knoesis.org/index.php/PCS
  28. 28. DATA sensor observations KNOWLEDGE situation awareness useful for decision making 29 Primary challenge is to bridge the gap between data and knowledge
  29. 29. 30 What if we could automate this sense making ability? … and do it efficiently and at scale
  30. 30. 31 Making sense of sensor data with Henson et al An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web, Applied Ont, 2011
  31. 31. 32 People are good at making sense of sensory input What can we learn from cognitive models of perception? The key ingredient is prior knowledge
  32. 32. * based on Neisser’s cognitive model of perception Observe Property Perceive Feature Explanation Discrimination 1 2 Translating low-level signals into high-level knowledge Focusing attention on those aspects of the environment that provide useful information Prior Knowledge 33 Perception Cycle* Convert large number of observations to semantic abstractions that provide insights and translate into decisions
  33. 33. 34 To enable machine perception, Semantic Web technology is used to integrate sensor data with prior knowledge on the Web W3C SSN XG 2010-2011, SSN Ontology
  34. 34. W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph 35 Prior knowledge on the Web
  35. 35. W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph 36 Prior knowledge on the Web
  36. 36. Observe Property Perceive Feature Explanation 1 Translating low-level signals into high-level knowledge 37 Explanation Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building
  37. 37. Inference to the best explanation • In general, explanation is an abductive problem; and hard to compute Finding the sweet spot between abduction and OWL • Single-feature assumption* enables use of OWL-DL deductive reasoner * An explanation must be a single feature which accounts for all observed properties 38 Explanation Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building Representation of Parsimonious Covering Theory in OWL-DL
  38. 38. ExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn} elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Observed Property Explanatory Feature 39 Explanation Explanatory Feature: a feature that explains the set of observed properties
  39. 39. Observe Property Perceive Feature Explanation Discrimination 2 Focusing attention on those aspects of the environment that provide useful information 40 Discrimination Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features
  40. 40. ExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ∃ssn:isPropertyOf.{fn} elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Expected Property Explanatory Feature 41 Discrimination Expected Property: would be explained by every explanatory feature
  41. 41. NotApplicableProperty ≡ ¬∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ¬∃ssn:isPropertyOf.{fn} elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Not Applicable Property Explanatory Feature 42 Discrimination Not Applicable Property: would not be explained by any explanatory feature
  42. 42. DiscriminatingProperty ≡ ¬ExpectedProperty ⊓ ¬NotApplicableProperty elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Discriminating Property Explanatory Feature 43 Discrimination Discriminating Property: is neither expected nor not-applicable
  43. 43. Qualities -High BP -Increased Weight Entities -Hypertension -Hypothyroidism kHealth Machine Sensors Personal Input EMR/PHR Comorbidity risk score e.g., Charlson Index Longitudinal studies of cardiovascular risks - Find risk factors - Validation - domain knowledge - domain expert Find contribution of each risk factor Risk Assessment Model Current Observations -Physical -Physiological -History Risk Score (e.g., 1 => continue 3 => contact clinic) Model CreationValidate correlations Historical observations e.g., EMR, sensor observations 44 Risk Score: from Data to Abstraction and Actionable Information
  44. 44. Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time • Runs out of resources with prior knowledge >> 15 nodes • Asymptotic complexity: O(n3) 45 How do we implement machine perception efficiently on a resource-constrained device?
  45. 45. intelligence at the edge Approach 1: Send all sensor observations to the cloud for processing 46 Approach 2: downscale semantic processing so that each device is capable of machine perception
  46. 46. 010110001101 0011110010101 1000110110110 101100011010 0111100101011 000110101100 0110100111 47 Efficient execution of machine perception Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
  47. 47. O(n3) < x < O(n4) O(n) Efficiency Improvement • Problem size increased from 10’s to 1000’s of nodes • Time reduced from minutes to milliseconds • Complexity growth reduced from polynomial to linear 48 Evaluation on a mobile device
  48. 48. 2 Prior knowledge is the key to perception Using SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web 3 Intelligence at the edge By downscaling semantic inference, machine perception can execute efficiently on resource-constrained devices 1 Translate low-level data to high-level knowledge Machine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making 49 Semantic Perception for smarter analytics: 3 ideas to takeaway
  49. 49. • Healthcare: ADFH, Asthma, GI – Using kHealth system • Smart Cities: Traffic management 50 I will use applications in 2 domains to demonstrate • Social Media Analysis*: Crisis coordination Using Twitris platform
  50. 50. kHealth Knowledge-enabled Healthcare To reduce preventable readmissions of patients with chronic heart failure (CHF, specifically ADHF) and GI; Asthma in children 51
  51. 51. Brief Introduction Video
  52. 52. Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information canary in a coal mine Empowering Individuals (who are not Larry Smarr!) for their own health kHealth: knowledge-enabled healthcare 53
  53. 53. Weight Scale Heart Rate Monitor Blood Pressure Monitor 54 Sensors Android Device (w/ kHealth App) Readmissions cost $17B/year: $50K/readmission; Total kHealth kit cost: < $500 kHealth Kit for the application for reducing ADHF readmission ADHF – Acute Decompensated Heart Failure
  54. 54. 55 1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/ 2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145. 25 million 300 million $50 billion 155,000 593,000 People in the U.S. are diagnosed with asthma (7 million are children)1. People suffering from asthma worldwide2. Spent on asthma alone in a year2 Hospital admissions in 20063 Emergency department visits in 20063 Asthma: Severity of the problem
  55. 55. Sensordrone (Carbon monoxide, temperature, humidity) Node Sensor (exhaled Nitric Oxide) 56 Sensors Android Device (w/ kHealth App) Total cost: ~ $500 kHealth Kit for the application for Asthma management *Along with two sensors in the kit, the application uses a variety of population level signals from the web: Pollen level Air Quality Temperature & Humidity
  56. 56. what can we do to avoid asthma episode? 59 Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies. Variety Volume VeracityVelocity Value What risk factors influence asthma control? What is the contribution of each risk factor? semantics Understanding relationships between health signals and asthma attacks for providing actionable information WHY Big Data to Smart Data: Asthma example
  57. 57. kHealth: Health Signal Processing Architecture Personal level Signals Public level Signals Population level Signals Domain Knowledge Risk Model Events from Social Streams Take Medication before going to work Avoid going out in the evening due to high pollen levels Contact doctor Analysis Personalized Actionable Information Data Acquisition & aggregation 60
  58. 58. 61 Asthma Domain Knowledge Domain Knowledge Asthma Control à Daily Medication Choices for starting therapy Not Well Controlled Poor Controlled Severity Level of Asthma (Recommended Action) (Recommended Action) (Recommended Action) Intermittent Asthma SABA prn - - Mild Persistent Asthma Low dose ICS Medium ICS Medium ICS Moderate Persistent Asthma Medium dose ICS alone Or with LABA/montelukast Medium ICS + LABA/Montelukast Or High dose ICS Medium ICS + LABA/Montelukast Or High dose ICS* Severe Persistent Asthma High dose ICS with LABA/montelukast Needs specialist care Needs specialist care ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist Asthma Control and Actionable Information
  59. 59. 62 Patient Health Score (diagnostic) Risk assessment model Semantic Perception Personal level Signals Public level Signals Domain Knowledge Population level Signals GREEN -- Well Controlled YELLOW – Not well controlled Red -- poor controlled How controlled is my asthma?
  60. 60. 63 Patient Vulnerability Score (prognostic) Risk assessment model Semantic Perception Personal level Signals Public level Signals Domain Knowledge Population level Signals Patient health Score How vulnerable* is my control level today? *considering changing environmental conditions and current control level
  61. 61. 67 Sensordrone – for monitoring environmental air quality Wheezometer – for monitoring wheezing sounds Can I reduce my asthma attacks at night? What are the triggers? What is the wheezing level? What is the propensity toward asthma? What is the exposure level over a day? Commute to Work Asthma: Actionable Information for Asthma Patients Luminosity CO level CO in gush during day time Actionable Information Personal level Signals Public level Signals Population level Signals What is the air quality indoors?
  62. 62. 68 Population Level Personal Wheeze – Yes Do you have tightness of chest? –Yes ObservationsPhysical-Cyber-Social System Health Signal Extraction Health Signal Understanding <Wheezing=Yes, time, location> <ChectTightness=Yes, time, location> <PollenLevel=Medium, time, location> <Pollution=Yes, time, location> <Activity=High, time, location> Wheezing ChectTightness PollenLevel Pollution Activity Wheezing ChectTightness PollenLevel Pollution Activity RiskCategory <PollenLevel, ChectTightness, Pollution, Activity, Wheezing, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> . . . Expert Knowledge Background Knowledge tweet reporting pollution level and asthma attacks Acceleration readings from on-phone sensors Sensor and personal observations Signals from personal, personal spaces, and community spaces Risk Category assigned by doctors Qualify Quantify Enrich Outdoor pollen and pollution Public Health Health Signal Extraction to Understanding Well Controlled - continue Not Well Controlled – contact nurse Poor Controlled – contact doctor
  63. 63. 73 RDF OWL How are machines supposed to integrate and interpret sensor data? Semantic Sensor Networks (SSN)
  64. 64. 74 W3C Semantic Sensor Network Ontology Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K., Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
  65. 65. 76 W3C Semantic Sensor Network Ontology Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K., Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
  66. 66. SSN Ontology 2 Interpreted data (deductive) [in OWL] e.g., threshold 1 Annotated Data [in RDF] e.g., label 0 Raw Data [in TEXT] e.g., number Levels of Abstraction 3 Interpreted data (abductive) [in OWL] e.g., diagnosis Intellego “150” Systolic blood pressure of 150 mmHg Elevated Blood Pressure Hyperthyroidism …… 78
  67. 67. 79 Making sense of sensor data with
  68. 68. People are good at making sense of sensory input What can we learn from cognitive models of perception? • The key ingredient is prior knowledge 80
  69. 69. * based on Neisser’s cognitive model of perception Observe Property Perceive Feature Explanation Discrimination 1 2 Perception Cycle* Translating low-level signals into high-level knowledge Focusing attention on those aspects of the environment that provide useful information Prior Knowledge 81
  70. 70. To enable machine perception, Semantic Web technology is used to integrate sensor data with prior knowledge on the Web 82
  71. 71. Prior knowledge on the Web W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph 83
  72. 72. Prior knowledge on the Web W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph 84
  73. 73. Observe Property Perceive Feature Explanation 1 Translating low-level signals into high-level knowledge Explanation Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building 85
  74. 74. Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features Observe Property Perceive Feature Explanation Discrimination 2 Focusing attention on those aspects of the environment that provide useful information Discrimination 86
  75. 75. Discrimination Discriminating Property: is neither expected nor not-applicable DiscriminatingProperty ≡ ¬ExpectedProperty ⊓ ¬NotApplicableProperty elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Discriminating Property Explanatory Feature 87
  76. 76. Semantic scalability: Resource savings of abstracting sensor data 88 Orders of magnitude resource savings for generating and storing relevant abstractions vs. raw observations. Relevant abstractions Raw observations
  77. 77. How do we implement machine perception efficiently on a resource-constrained device? Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time • Runs out of resources with prior knowledge >> 15 nodes • Asymptotic complexity: O(n3) 89
  78. 78. intelligence at the edge Approach 1: Send all sensor observations to the cloud for processing Approach 2: downscale semantic processing so that each device is capable of machine perception 90 Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
  79. 79. Efficient execution of machine perception Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning 010110001101 0011110010101 1000110110110 101100011010 0111100101011 000110101100 0110100111 91
  80. 80. O(n3) < x < O(n4) O(n) Efficiency Improvement • Problem size increased from 10’s to 1000’s of nodes • Time reduced from minutes to milliseconds • Complexity growth reduced from polynomial to linear Evaluation on a mobile device 92
  81. 81. 2 Prior knowledge is the key to perception Using SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web 3 Intelligence at the edge By downscaling semantic inference, machine perception can execute efficiently on resource-constrained devices Semantic Perception for smarter analytics: 3 ideas to takeaway 1 Translate low-level data to high-level knowledge Machine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making 93
  82. 82. 94 PCS Computing for Traffic Analytics: for personal and community needs
  83. 83. 96 Duration: 36 months Requested funding: 2.531.202 € CityPulse Consortium City of Aarhus City of Brasov
  84. 84. Vehicular traffic data from San Francisco Bay Area aggregated from on-road sensors (numerical data/Physical), incident reports (textual/Cyber) and Tweets (Social) 97 http://511.org/ Every minute update of speed, volume, travel time, and occupancy resulting in 178 million link status observations, 8 million tweets, 738 active events, and 146 scheduled events with many unevenly sampled observations collected over 3 months. Variety Volume VeracityVelocity Value Can we detect the onset of traffic congestion? Can we characterize traffic congestion based on events? Can we estimate traffic delays in a road network? semantics Representing prior knowledge of traffic lead to a focused exploration of this massive dataset Big Data to Smart Data: Traffic Management example
  85. 85. 98 Heterogeneity leading to complementary observations
  86. 86. Textual Streams for City Related Events 99
  87. 87. City Event Annotation – CRF Annotation Examples Last O night O in O CA... O (@ O Half B-LOCATION Moon I-LOCATION Bay B-LOCATION Brewing I-LOCATION Company O w/ O 8 O others) O http://t.co/w0eGEJjApY O B-LOCATION I-LOCATION B-EVENT I-EVENT O Tags used in our approach: These are the annotations provided by a Conditional Random Field model trained on tweet corpus to spot city related events and location BIO – Beginning, Intermediate, and Other is a notation used in multi-phrase entity spotting 100
  88. 88. Accident Music event Sporting event Road Work Theatre event External events <ActiveEvents, ScheduledEvents> Internal observations <speed, volume, traveTime> Weather Time of Day 101 Modeling Traffic Events: Pictorial representation
  89. 89. Slow moving traffic Link Description Scheduled Event Scheduled Event 511.org 511.org Schedule Information 511.org 102
  90. 90. Domain Experts ColdWeather PoorVisibility SlowTraffic IcyRoad Declarative domain knowledge Causal knowledge Linked Open Data ColdWeather(YES/NO)IcyRoad (ON/OFF) PoorVisibility (YES/NO)SlowTraffic (YES/NO) 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 Domain Observations Domain Knowledge Structure and parameters 103 WinterSeason Correlations to causations using Declarative knowledge on the Semantic Web Combining Data and Knowledge Graph
  91. 91. Traffic jam Link Description Scheduled Event traffic jambaseball game Add missing random variables Time of day bad weather CapableOf slow traffic bad weather Traffic data from sensors deployed on road network in San Francisco Bay Area time of day traffic jambaseball game time of day slow traffic Three Operations: Complementing graphical model structure extraction Add missing links bad weather traffic jambaseball game time of day slow traffic Add link direction bad weather traffic jambaseball game time of day slow traffic go to baseball game Causes traffic jam Knowledge from ConceptNet5 traffic jam CapableOfoccur twice each day traffic jam CapableOf slow traffic 104
  92. 92. City Infrastructure Tweets from a city POS Tagging Hybrid NER+ Event term extraction Geohashing Temporal Estimation Impact Assessment Event Aggregation OSM Locations SCRIBE ontology 511.org hierarchy City Event Extraction City Event Extraction Solution Architecture City Event Annotation OSM – Google Open Street Maps NER – Named Entity Recognition 105
  93. 93. City Events from Sensor and Social Streams can be… • Complementary • Additional information • e.g., slow traffic from sensor data and accident from textual data • Corroborative • Additional confidence • e.g., accident event supporting a accident report from ground truth • Timely • Additional insight • e.g., knowing poor visibility before formal report from ground truth 106
  94. 94. Evaluation – Extracted Events AND Ground Truth Verification Complementary Events Event Sources City events extracted from tweets 511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade City event extracted from twitter reporting about traffic complementing the road construction event reported on 511.org
  95. 95. Evaluation – Extracted Events AND Ground Truth Verification Corroborative Events Event Sources City events extracted from tweets 511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade City event from twitter providing corroborative evidence for fog reported by 511.org
  96. 96. Evaluation – Extracted Events AND Ground Truth Verification Event Sources City events extracted from tweets 511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade City event from twitter providing report of a tornado before an event related to strong winds is reported by 511.org Timeliness
  97. 97. Events from Social Streams and City Department* Corroborative EventsComplementary Events Event Sources City events extracted from tweets 511.org, Active events e.g., accidents, breakdowns 511.org, Scheduled events e.g., football game, parade City event from twitter providing complementary and corroborative evidence for fog reported by 511.org *511.org 110
  98. 98. 111 Actionable Information in City Management Tweets from a CityTraffic Sensor Data OSM Locations SCRIBE ontology 511.org hierarchy Web of Data How issues in a city can be resolved? e.g., what should I do when I have fog condition?
  99. 99. Two excellent videos • Vinod Khosla: the Power of Storytelling and the Future of Healthcare • Larry Smarr: The Human Microbiome and the Revolution in Digital Health 112 Wrapping up: For more on importance of what we talked about
  100. 100. • Big Data is every where – at individual and community levels - not just limited to corporation – with growing complexity: Physical-Cyber-Social • Analysis is not sufficient • Bottom up techniques are not sufficient, need top down processing, need background knowledge 113 Wrapping up: Take Away
  101. 101. Wrapping up: Take Away • Focus on Humans and Improve human life and experience with SMART Data. – Data to Information to Contextually Relevant Abstractions (Semantic Perception) – Actionable Information (Value from data) to assist and support human in decision making. • Focus on Value -- SMART Data – Big Data Challenges without the intention of deriving Value is a “Journey without GOAL”. 114
  102. 102. • Collaborators: Clinicians: Dr. William Abrahams (OSU- Wexner), Dr. Shalini Forbis (Dayton Childrens), Dr. Sangeeta Agrawal (VA), Valerie Shalin (WSU Cognitive Scientists ), Payam Barnaghi (U-Surrey), Ramesh Jain(UCI), … • Funding: NSF (esp. IIS-1111183 “SoCS: Social Media Enhanced Organizational Sensemaking in Emergency Response,”), AFRL, NIH, Industry…. Acknowledgment
  103. 103. Amit Sheth’s PHD students Ashutos h Jadhav* Hemant Purohit Vinh Nguyen Lu Chen Pavan Kapanipathi* Pramod Anantharam* Sujan Perera Maryam Panahiazar Sarasi Lalithsena Shreyansh Batt Kalpa Gunaratna Delroy Cameron Sanjaya Wijeratne Wenbo Wang Special thanks: Ashu. This presentation covers some of the work of my PhD students. Key contributors: Pramod Anantharam, Cory Henson and TK Prasad. 116 Special thanks
  104. 104. • Among top universities in the world in World Wide Web (cf: 10-yr impact, Microsoft Academic Search: among top 10 in June2014) • Among the largest academic groups in the US in Semantic Web + Social/Sensor Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical & Biomedicine Applications • Exceptional student success: internships and jobs at top salary (IBM Watson/Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research universities, NLM, startups ) • 100 researchers including 15 World Class faculty (>3K citations/faculty avg) and ~45 PhD students- practically all funded • Extensive research for largely multidisciplinary projects; world class resources; industry sponsorships/collaborations (Google, IBM, …) 117
  105. 105. Top organization in WWW: 10-yr Field Rating (MAS) 118
  106. 106. 119
  107. 107. 120 thank you, and please visit us at http://knoesis.org

×