Advertisement
Advertisement

More Related Content

Advertisement

Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web

  1. 1
  2. 2011 How much data? 48 (2013) 500 (2013) 2http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/
  3. 1% of the data is used for analysis. 3 http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explode http://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
  4. Variety Semi structured 4
  5. Velocity Fast Data Rapid Changes Real-Time/Stream Analysis Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail 5
  6. • Focus on verticals: advertising‚ social media‚ retail‚ financial services‚ telecom‚ and healthcare – Aggregate data, focused on transactions, limited integration (limited complexity), analytics to find (simple) patterns – Emphasis on technologies to handle volume/scale, and to lesser extent velocity: Hadoop, NoSQL,MPP warehouse …. – Full faith in the power of data (no hypothesis), bottom up analysis 6 Current Focus on Big Data
  7. • What if your data volume gets so large and varied you don't know how to deal with it? • Do you store all your data? • Do you analyze it all? • How can you find out which data points are really important? • How can you use it to your best advantage? 7 Questions typically asked on Big Data http://www.sas.com/big-data/
  8. http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies/ Variety of Data Analytics Enablers 8
  9. • Prediction of the spread of flu in real time during H1N1 2009 – Google tested a mammoth of 450 million different mathematical models to test the search terms, comparing their predictions against the actual flu cases; 45 important parameters were founds – Model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013] • FareCast: predict the direction of air fares over different routes [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013] • NY city manholes problem [ICML Discussion, 2012] 9 Illustrative Big Data Applications
  10. • Current focus mainly to serve business intelligence and targeted analytics needs, not to serve complex individual and collective human needs (e.g., empower human in health, fitness and well- being; better disaster coordination) that is highly personalized/individualized/contextualized – Incorporate real-world complexity: multi-modal and multi-sensory nature of real-world and human perception – Need deeper understanding of data and its role to information (e.g., skew, coverage) • Human involvement and guidance: Leading to actionable information, understanding and insight right in the context of human activities – Bottom-up & Top-down processing: Infusion of models and background knowledge (data + knowledge + reasoning) 10 What is missing?
  11. Makes Sense Actionable or help decision support/making 11
  12. Smart Data Smart data makes sense out of Big data It provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, in-turn providing actionable information and improve decision making. 12
  13. “OF human, BY human and FOR human” Smart data is focused on the actionable value achieved by human involvement in data creation, processing and consumption phases for improving the human experience. Another perspective on Smart Data 13
  14. Descriptive Exploratory Inferential Predictive Causal Improved Analytics CREATION PROCESSING EXPERIENCE & DECISION MAKING 14 Human Centric Computing
  15. “OF human, BY human and FOR human” Another perspective on Smart Data 15
  16. Petabytes of Physical(sensory)-Cyber-Social Data everyday! More on PCS Computing: http://wiki.knoesis.org/index.php/PCS 16 ‘OF human’ : Relevant Real-time Data Streams for Human Experience
  17. “OF human, BY human and FOR human” 17 Another perspective on Smart Data
  18. Use of Prior Human-created Knowledge Models 18 ‘BY human’: Involving Crowd Intelligence in data processing workflows Crowdsourcing and Domain-expert guided Machine Learning Modeling
  19. “OF human, BY human and FOR human” Another perspective on Smart Data 19
  20. Detection of events, such as wheezing sound, indoor temperature, humidity, dust, and CO2 level Weather Application Asthma Healthcare Application Close the window at home during day to avoid CO2 in gush, to avoid asthma attacks at night 20 ‘FOR human’ : Improving Human Experience Population Level Personal Public Health Action in the Physical World
  21. 21 Why do we care about Smart Data rather than Big Data?
  22. Transforming Big Data into Smart Data: Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web Put Knoesis Banner Keynote at SEBD 2013, July 1, 2013 and invited talk in universities in Spain, June 2013. The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis), Wright State, USA Pavan Kapanipathi Pramod Anantharam Amit Sheth Cory Henson Dr. T.K. Prasad Maryam Panahiazar Contributions by many, but Special Thanks to: Hemant Purohit
  23. Second-costliest hurricane in United States history estimated damage $75 billion 90-115 mph winds State of Emergency in New York 285 people killed on the track of Sandy 750,000 without power (NY) Immense devastation and Human suffering 23 Big Data to Smart Data: Disaster Management example http://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.html
  24. 20 million tweets with “sandy, hurricane” keywords between Oct 27th and Nov 1st 2nd most popular topic on Facebook during 2012 Social (Big) Data during Hurricane Sandy 24 • http://www.guardian.co.uk/news/datablog/2 012/oct/31/twitter-sandy-flooding • http://www.huffingtonpost.com/2012/11/02 /twitter-hurricane-sandy_n_2066281.html • http://mashable.com/2012/10/31/hurricane- sandy-facebook/
  25. For information seeking For timely information For unique information For unfiltered information To determine disaster magnitude To check in with family and friends To self-mobilize To maintain a sense of community To seek emotional support and healing Governments Emergency management organizations Journalists Disaster responders Public BIG DATA TO SMART DATA: WHY? and FOR WHOM? 25 Fraustino et al. Social Media Use during Disasters: A Review of the Knowledge Base and Gaps. US Dept. of Homeland Security, START 2012.
  26. Improving situational awareness - Timely delivery of necessary information to the right people Improving coordination between resource seekers and suppliers Detecting the magnitude of disaster by people sentiments. Many more challenges… Can SNS’s make Disaster Management easier – Giving Actionable Information (Smart Data) 26 http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rec http://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.html http://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.html
  27. Volume Twitter hits half a billion tweets a day! Challenges Delivering the necessary actionable/information to the right people 27 http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US
  28. Velocity Volume @ConEdison Twitter handle that the company had only set up in June gained an extra 16,000 followers over the storm. – Did the information reach everyone? Challenges Delivering the necessary/actionable information to the right people Rate of Data Arrival Approximately 7000 TPS 10 images per second on instagram 28 http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_US http://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdf
  29. http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion- tweets-a-day/ http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion _accounts_140m_in_the_US Velocity Variety Volume Semi Structured Structured Unstructured Sensors Linked Open Data Wikipedia Challenges Delivering the necessary/actionable information to the right people 29
  30. Velocity Variety Veracity Volume Challenges Delivering the necessary/actionable information to the right people 30http://www.buzzfeed.com/jackstuef/the-man-behind-comfortablysmug-hurricane-sandys
  31. Velocity Variety Veracity Volume 31
  32. Value -Makes Sense -Actionable Information -Decision support/making Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ 32 Smart Data focuses on the value
  33. Value -Makes Sense -Actionable Information -Decision support/making Disaster Management Victims Timely and Contextual Information about • Electricity, Food, Water, Shelter and donation offers related to the disaster. Data http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/ 33
  34. Descriptive Exploratory Inferential Predictive Causal Human Centric Computing Improved Analytics Creation Processing Experience 34 Revisiting..
  35. • Healthcare – kHealth – SemHeath • Social event coordination – Twitris • Traffic monitoring – kTraffic 35 Applications of Smart Data Analytics
  36. The Patient of the Future MIT Technology Review, 2012 http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/ 36
  37. To gain new insight in patient care & early indications of disease 37 Smart Data in Healthcare
  38. Sensing is a key enabler of the Internet of Things BUT, how do we make sense of the resulting avalanche of sensor data? 50 Billion Things by 2020 (Cisco) 38
  39. Parkinson’s disease (PD) data from The Michael J. Fox Foundation for Parkinson’s Research. 39 1https://www.kaggle.com/c/predicting-parkinson-s-disease-progression-with-smartphone-data 8 weeks of data from 5 sensors on a smart phone, collected for 16 patients resulting in ~12 GB (with lot of missing data). Variety Volume VeracityVelocity Value Can we detect the onset of Parkinson’s disease? Can we characterize the disease progression? Can we provide actionable information to the patient? semantics Representing prior knowledge of PD led to a focused exploration of this massive dataset WHY Big Data to Smart Data: Healthcare example
  40. 40 Big Data to Smart Data Using a Knowledge Based Approach ParkinsonMild(person) = Tremor(person) ∧ PoorBalance(person) ParkinsonModerate(person) = MoveSlow(person) ∧ PoorSleep(person) ∧ MonotoneSpeech(person) ParkinsonAdvanced(person) = Fall(person) Control Group PD Patients Movements of an active person has a good distribution over X, Y, and Z axis Restricted movements by a PD patient can be seen in the acceleration readings Audio is well modulated with good variations in the energy of the voice Audio is not well modulated represented a monotone speech Declarative Knowledge of Parkinson’s Disease used to focus our attention on symptom manifestations in sensor observations
  41. • 25 million people in the U.S. are diagnosed with asthma (7 million are children)1. • 300 million people suffering from asthma worldwide2. • Asthma related healthcare costs alone are around $50 billion a year2. • 155,000 hospital admissions and 593,000 emergency department visits in 20063. 41 1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/ 2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145. Asthma: Severity of the problem
  42. Asthma is a multifactorial disease with health signals spanning personal, public health, and population levels. 42 Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies. Variety Volume VeracityVelocity Value Can we detect the asthma severity level? Can we characterize asthma control level? What risk factors influence asthma control? What is the contribution of each risk factor?semantics Understanding relationships between health signals and asthma attacks for providing actionable information WHY Big Data to Smart Data: Healthcare example
  43. 43 Population Level Personal Public Health Variety: Health signals span heterogeneous sources Volume: Health signals are fine grained Velocity: Real-time change in situations Veracity: Reliability of health signals may be compromised Value: Can I reduce my asthma attacks at night? Decision support to doctors by providing them with deeper insights into patient asthma care Asthma: Demonstration of Value
  44. 44 Sensordrone – for monitoring environmental air quality Wheezometer – for monitoring wheezing sounds Can I reduce my asthma attacks at night? What are the triggers? What is the wheezing level? What is the propensity toward asthma? What is the exposure level over a day? What is the air quality indoors? Commute to Work Personal Public Health Population Level Closing the window at home in the morning and taking an alternate route to office may lead to reduced asthma attacks Actionable Information Asthma: Actionable Information for Asthma Patients
  45. Personal, Public Health, and Population Level Signals for Monitoring Asthma Asthma Control => Daily Medication Choices for starting therapy Not Well Controlled Poor Controlled Severity Level of Asthma (Recommended Action) (Recommended Action) (Recommended Action) Intermittent Asthma SABA prn - - Mild Persistent Asthma Low dose ICS Medium ICS Medium ICS Moderate Persistent Asthma Medium dose ICS alone Or with LABA/montelukast Medium ICS + LABA/Montelukast Or High dose ICS Medium ICS + LABA/Montelukast Or High dose ICS* Severe Persistent Asthma High dose ICS with LABA/montelukast Needs specialist care Needs specialist care ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist Asthma Control and Actionable Information Sensors and their observations for understanding asthma 45
  46. 46 Personal Level Signals Societal Level Signals (Personal Level Signals) (Personalized Societal Level Signal) (Societal Level Signals) Societal Level Signals Relevant to the Personal Level Personal Level Sensors (kHealth**) (EventShop*) Qualify Quantify Action Recommendation What are the features influencing my asthma? What is the contribution of each of these features? How controlled is my asthma? (risk score) What will be my action plan to manage asthma? Storage Societal Level Sensors Asthma Early Warning Model (AEWM) Query AEWM Verify & augment domain knowledge Recommended Action Action Justification Asthma Early Warning Model *http://www.slideshare.net/jain49/eventshop-120721, ** http://www.youtube.com/watch?v=btnRi64hJp4
  47. 47 Population Level Personal Wheeze – Yes Do you have tightness of chest? –Yes ObservationsPhysical-Cyber-Social System Health Signal Extraction Health Signal Understanding <Wheezing=Yes, time, location> <ChectTightness=Yes, time, location> <PollenLevel=Medium, time, location> <Pollution=Yes, time, location> <Activity=High, time, location> Wheezing ChectTightness PollenLevel Pollution Activity Wheezing ChectTightness PollenLevel Pollution Activity RiskCategory <PollenLevel, ChectTightness, Pollution, Activity, Wheezing, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> <2, 1, 1,3, 1, RiskCategory> . . . Expert Knowledge Background Knowledge tweet reporting pollution level and asthma attacks Acceleration readings from on-phone sensors Sensor and personal observations Signals from personal, personal spaces, and community spaces Risk Category assigned by doctors Qualify Quantify Enrich Outdoor pollen and pollution Public Health Health Signal Extraction to Understanding Well Controlled - continue Not Well Controlled – contact nurse Poor Controlled – contact doctor
  48. … and do it efficiently and at scale What if we could automate this sense making ability? 48
  49. People are good at making sense of sensory input What can we learn from cognitive models of perception? • The key ingredient is prior knowledge 49
  50. * based on Neisser’s cognitive model of perception Observe Property Perceive Feature Explanation Discrimination 1 2 Perception Cycle* Translating low-level signals into high-level knowledge Focusing attention on those aspects of the environment that provide useful information Prior Knowledge 50
  51. To enable machine perception, Semantic Web technology is used to integrate sensor data with prior knowledge on the Web 51
  52. Prior knowledge on the Web W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph 52
  53. Prior knowledge on the Web W3C Semantic Sensor Network (SSN) Ontology Bi-partite Graph 53
  54. Observe Property Perceive Feature Explanation 1 Translating low-level signals into high-level knowledge Explanation Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building 54
  55. Explanation Inference to the best explanation • In general, explanation is an abductive problem; and hard to compute Finding the sweet spot between abduction and OWL • Single-feature assumption* enables use of OWL-DL deductive reasoner * An explanation must be a single feature which accounts for all observed properties Explanation is the act of choosing the objects or events that best account for a set of observations; often referred to as hypothesis building 55
  56. Explanation Explanatory Feature: a feature that explains the set of observed properties ExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn} elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Observed Property Explanatory Feature 56
  57. Discrimination is the act of finding those properties that, if observed, would help distinguish between multiple explanatory features Observe Property Perceive Feature Explanation Discrimination 2 Focusing attention on those aspects of the environment that provide useful information Discrimination 57
  58. Discrimination Expected Property: would be explained by every explanatory feature ExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ∃ssn:isPropertyOf.{fn} elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Expected Property Explanatory Feature 58
  59. Discrimination Not Applicable Property: would not be explained by any explanatory feature NotApplicableProperty ≡ ¬∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ¬∃ssn:isPropertyOf.{fn} elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Not Applicable Property Explanatory Feature 59
  60. Discrimination Discriminating Property: is neither expected nor not-applicable DiscriminatingProperty ≡ ¬ExpectedProperty ⊓ ¬NotApplicableProperty elevated blood pressure clammy skin palpitations Hypertension Hyperthyroidism Pulmonary Edema Discriminating Property Explanatory Feature 60
  61. Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information canary in a coal mine Our Motivation kHealth: knowledge-enabled healthcare 61
  62. Qualities -High BP -Increased Weight Entities -Hypertension -Hypothyroidism kHealth Machine Sensors Personal Input EMR/PHR Comorbidity risk score e.g., Charlson Index Longitudinal studies of cardiovascular risks - Find correlations - Validation - domain knowledge - domain expert Parameterize the model Risk Assessment Model Current Observations -Physical -Physiological -History Risk Score (Actionable Information) Model CreationValidate correlations Historical observations of each patient Risk Score: from Data to Abstraction and Actionable Information 62
  63. How do we implement machine perception efficiently on a resource-constrained device? Use of OWL reasoner is resource intensive (especially on resource-constrained devices), in terms of both memory and time • Runs out of resources with prior knowledge >> 15 nodes • Asymptotic complexity: O(n3) 63
  64. intelligence at the edge Approach 1: Send all sensor observations to the cloud for processing Approach 2: downscale semantic processing so that each device is capable of machine perception 64 Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
  65. Efficient execution of machine perception Use bit vector encodings and their operations to encode prior knowledge and execute semantic reasoning 010110001101 0011110010101 1000110110110 101100011010 0111100101011 000110101100 0110100111 65
  66. O(n3) < x < O(n4) O(n) Efficiency Improvement • Problem size increased from 10’s to 1000’s of nodes • Time reduced from minutes to milliseconds • Complexity growth reduced from polynomial to linear Evaluation on a mobile device 66
  67. 2 Prior knowledge is the key to perception Using SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web 3 Intelligence at the edge By downscaling semantic inference, machine perception can execute efficiently on resource-constrained devices Semantic Perception for smarter analytics: 3 ideas to takeaway 1 Translate low-level data to high-level knowledge Machine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making 67
  68. • Real Time Feature Streams: http://www.youtube.com/watch?v=_ews4w_eCpg • kHealth: http://www.youtube.com/watch?v=btnRi64hJp4 68 Demos
  69. 73 Smart Data in Social Media Analytics To Understand the human social dynamics in real world events
  70. 0.5B Tweets per day 0.5B Users 60% on Mobile 5530 Tweets per second related to the Japan earthquake and tsunami 17000 Tweets per second 74 Twitter During Real-world Events of Interest http://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/ http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013- twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitter http://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/
  71. 75http://usatoday30.usatoday.com/news/politics/twitter-election-meter http://twitris.knoesis.org/
  72. State of the Art – Uni/Bi Dimensional Analysis During Elections Topics Sentiments 76
  73. Twitris’ Dimensions of Integrated Semantic Analysis 77Sheth et al. Twitris- a System for Collective Social Intelligence, ESNAM-2013
  74. 78 http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249 http://semanticweb.com/election-2012-the-semantic-recap_b33278
  75. 79 [The screenshots of Twitris+ were taken on Nov. 6th 6 PM EST] /t
  76. 80 Twitris: Sentiment Analysis- Smart Answers with reasoning! How was Obama doing in the first debate?
  77. 81 Red Color: Negative Topics Green Color: Positive Topics Twitris: Sentiment Analysis- Smart Answers with reasoning! How was Obama doing in the second debate? SMART DATA IS ABOUT ANALYSIS FOR REASONING (what caused the positive sentiment for Democrats) BEHIND THE REAL-WORLD ACTIONS (Democrats’ win) http://knoesis.wright.edu/library/resource.php?id=1787
  78. Top 100 influential users that talks about Barack Obama Positive or Negative Influence Twitris: Network Analysis SMART DATA TELLS YOU HOW CAN A SYSTEM BE TWEAKED FOR THE DESIRED ACTIONS! Could we engage with users (targeted) with extreme polarity leaning for Obama to spark an agenda in the whole network of voters (ACTION)? 82
  79. Twitris: Community Evolution SMART DATA FOCUSES ON THE CAUSALITY OF CHANGES IN REAL-WORLD ACTIONS! Romney Obama Evolution of influencer interaction networks for Romney vs. Obama topical communities, during U.S. Presidential Election 2012 debates Before 1st debate After 1st debate After Hurricane Sandy After 3rd debate 83
  80. The Dead People mentioned in the event OWC Twitris: Impact of Background Knowledge 84
  81. How People from Different parts of the world talked about US Election Images and Videos Related to US Election Twitris: Analysis by Location 85
  82. What is Smart Data in the context of Disaster Management ACTIONABLE: Timely delivery of right resources and information to the right people at right location! 86 Because everyone wants to Help, but DON’T KNOW HOW!
  83. Join us for the Social Good! http://twitris.knoesis.org RT @OpOKRelief: Southgate Baptist Church on 4th Street in Moore has food, water, clothes, diap ers, toys, and more. If you can't go,call 794 Text "FOOD" to 32333, REDCROSS to 90999, or STORM to 80888 to donate $10 in storm relief. #moore #oklahoma #disasterrelief #donate Want to help animals in #Oklahoma? @ASPCA tells how you can help: http://t.co/mt8l9PwzmO CITIZEN SENSORS RESPONSE TEAMS (including humanitarian org. and ‘pseudo’ responders) VICTIM SITE Coordination of needs and offers Using Social Media Does anyone know where to send a check to donate to the tornado victims? Where do I go to help out for volunteer work around Moore? Anyone know? Anyone know where to donate to help the animals from the Oklahoma disaster? #oklah oma #dogs Matched Matched Matched Serving the need! If you would like to volunteer today, help is desperately needed in Shawnee. Call 273-5331 for more info http://www.slideshare.net/hemant_knoesis/cscw-2012-hemantpurohit-11531612 87 Purohit et al. Framework to Analyze Coordination in Crisis Response, 2012. Int’l Collaboration in-progress:
  84. Smart Data from Twitris system for Disaster Response Coordination Which are the primary locations with most negative sentiments/emotions? Who are all the people to engage with for better information diffusion?Which are the most important organizations acting at my location? Smart data provides actionable information and improve decision making through semantic analysis of Big Data. Who are the resource seekers and suppliers? How can one donate? 88
  85. Source: Purohit et. al 2013, Information Filtering and Management Model for Disaster Response Coordination 89 Disaster Response Coordination Framework
  86. Disaster Response Coordination: Twitris Summary for Actionable Nuggets 90 Important tags to summarize Big Data flow Related to Oklahoma tornado Images and Videos Related to Oklahoma tornado
  87. 91 Disaster Response Coordination: Twitris Real-time information for needs Incoming Tweets with need types to give quick idea of what is needed and where currently #OKC Legends for Different needs #OKC (It is real-time widget for monitoring of needs, so will not be active after the event has passed) http://twitris.knoesis.org/oklahomatornado
  88. 92 Disaster Response Coordination: Influencers to engage with for specific needs Influential users are respective needs and their interaction network on the right.
  89. Really sparse Signal to Noise: • 2M tweets during the first week after #Oklahoma-tornado-2013 - 1.3% as the highly precise donation requests to help - 0.02% as the highly precise donation offers to help 93 • Anyone know how to get involved to help the tornado victims in Oklahoma??#tornado #oklahomacity (OFFER) • I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER) Disaster Response Coordination: Finding Actionable Nuggets for Responders to act • Text REDCROSS to 909-99 to donate to those impacted by the Moore tornado! http://t.co/oQMljkicPs (REQUEST) • Please donate to Oklahoma disaster relief efforts.: http://t.co/crRvLAaHtk (REQUEST) For responders, most important information is the scarcity and availability of resources, can we mine it via Social Media?
  90. • Features driven by the experience of domain experts at the responder organizations • Examples, – ‘I want to <donate/ help/ bring>’ for extraction of offering intention – ‘tent house’ OR ‘cots’ for shelter need types 94 Disaster Response Coordination: Human Knowledge to drive information extraction
  91. • A knowledge-driven approach – A rich inventory of metadata for tweets – Semantic matching for needs (query) vs. offers (documents) • Example, – @bladesofmilford please help get the word out,we are accepting kid clothes to send to the lil angels in Oklahoma.Drop off @MilfordGreenPiz (REQUEST) – I want to donate to the Oklahoma cause shoes clothes even food if I can (OFFER) 95 Disaster Response Coordination: Automatic Matching of needs and offers Matching the competitive intentions (Needs and Offers) can offload humans for the task of resource matchmaking for coordination.
  92. 96 Disaster Response Coordination: Engagement Interface for responders What-Where-How-Who-Why Coordination Influential users to engage with and resources for seekers/supplies at a location, at a timestamp Contextual Information for a chosen topical tags
  93. • Illustrious scenario: #Oklahoma-tornado 2013 97 Disaster Response Coordination: Anecdote for the value of Smart Data FEMA asked us to quickly filter out gas-leak related data Mining the data for smart nuggets to inform FEMA (Timely needs) Engaged with the author of this information to confirm (Veracity) e.g., All gas leaks in #moore were capped and stopped by 11:30 last night (at 5/22/2013 1:41:37) Lot of tweets for ‘how to/where to’ assist (‘pseudo’ responders) e.g., I want to go to Oklahoma this weekend & do what i can to help those people with food,cloths & supplies,im in the feel of wanting to help ! :)
  94. An event is a dynamic topic that evolves and might later fork into several distinct events. Smart Data analytics to capture rapidly evolving social data events 98 Social Media is the pulse of the populace, a true reflection of events all over the globe!
  95. Continuous Semantics 99
  96. Dynamic Model Creation Continuous Semantics 100
  97. Dynamic Model Creation: 101 Example of how background knowledge help understand situation described in the tweets, while also updating knowledge model also
  98. How is Continuous Semantics a form of Smart Data Analytics? Keeping the Background Knowledge abreast with the changes of the event Smartly learning and adapting data acquisition (Temporally apt Big Data, i.e. Fast Data) In-turn providing temporally relevant Smart Data through analysis 102
  99. 103 Smart Data Analytics in Traffic Management To improve the everyday life entangled due to our most common problem of sticking in traffic
  100. By 2001 over 285 million Indians lived in cities, more than in all North American cities combined (Office of the Registrar General of India 2001)1 1The Crisis of Public Transport in India 2IBM Smarter Traffic Modes of transportation in Indian Cities Texas Transportation Institute (TTI) Congestion report in U.S. 104 Severity of the Traffic Problem
  101. Vehicular traffic data from San Francisco Bay Area aggregated from on-road sensors (numerical) and incident reports (textual) 105 http://511.org/ Every minute update of speed, volume, travel time, and occupancy resulting in 178 million link status observations, 738 active events, and 146 scheduled events with many unevenly sampled observations collected over 3 months. Variety Volume VeracityVelocity Value Can we detect the onset of traffic congestion? Can we characterize traffic congestion based on events? Can we provide actionable information to decision makers? semantics Representing prior knowledge of traffic lead to a focused exploration of this massive dataset Big Data to Smart Data: Traffic Management example
  102. Slow moving traffic Link Description Scheduled Event Scheduled Event 511.org 511.org Schedule Information 511.org Traffic Monitoring 106 Heterogeneity in a Physical-Cyber-Social System
  103. 107 Heterogeneity in a Physical-Cyber-Social System
  104. • Observation: Slow Moving Traffic • Multiple Causes (Uncertain about the cause): – Scheduled Events: music events, fair, theatre events, concerts, road work, repairs, etc. – Active Events: accidents, disabled vehicles, break down of roads/bridges, fire, bad weather, etc. – Peak hour: e.g. 7 am – 9 am OR 4 pm – 6 pm • Each of these events may have a varying impact on traffic. • A delay prediction algorithm should process multimodal and multi-sensory observations. Uncertainty in a Physical-Cyber-Social System 108
  105. • Internal observations – Speed, volume, and travel time observations – Correlations may exist between these variables across different parts of the network • External events – Accident, music event, sporting event, and planned events – External events and internal observations may exhibit correlations Modeling Traffic Events 109
  106. Accident Music event Sporting event Road Work Theatre event External events <ActiveEvents, ScheduledEvents> Internal observations <speed, volume, traveTime> Weather Time of Day Modeling Traffic Events 110
  107. Domain Experts cold PoorVisibility SlowTraffic IcyRoad Declarative domain knowledge Causal knowledge Linked Open Data Cold (YES/NO) IcyRoad (ON/OFF) PoorVisibility (YES/NO) SlowTraffic (YES/NO) 1 0 1 1 1 1 1 0 1 1 1 1 1 0 1 0 Domain Observations Domain Knowledge Structure and parameters Complementing Probabilistic Models with Declarative Knowledge 112 Correlations to causations using Declarative knowledge on the Semantic Web
  108. • Declarative knowledge about various domains are increasingly being published on the web1,2. • Declarative knowledge describes concepts and relationships in a domain (structure). • Linked Open Data may be used to derive priors probability of events (parameters). • Explored the use declarative knowledge for structure using ConceptNet 5. 1http://conceptnet5.media.mit.edu/ 2http://linkeddata.org/ Domain Knowledge 113
  109. http://conceptnet5.media.mit.edu/web/c/en/traffic_jam Delay go to baseball game traffic jam traffic accident traffic jam ActiveEvent ScheduledEvent Causes traffic jam Causes traffic jam CapableOf slow traffic CapableOf occur twice each day Causes is_a bad weather CapableOf slow traffic road ice Causes accident TimeOfDay go to concert HasSubevent car crash accident RelatedTo car crash BadWeather Causes Causes is_a is_a is_a is_a is_a is_a is_a ConceptNet 5 114
  110. Traffic jam Link Description Scheduled Event traffic jambaseball game Add missing random variables Time of day bad weather CapableOf slow traffic bad weather Traffic data from sensors deployed on road network in San Francisco Bay Area time of day traffic jambaseball game time of day slow traffic Three Operations: Complementing graphical model structure extraction Add missing links bad weather traffic jambaseball game time of day slow traffic Add link direction bad weather traffic jambaseball game time of day slow traffic go to baseball game Causes traffic jam Knowledge from ConceptNet5 traffic jam CapableOfoccur twice each day traffic jam CapableOf slow traffic 115
  111. 116 Scheduled Event Active Event Day of week Time of day delay Travel time speed volume Structure extracted form traffic observations (sensors + textual) using statistical techniques Scheduled Event Active Event Day of week Time of day delayTravel time speed volume Bad Weather Enriched structure which has link directions and new nodes such as “Bad Weather” potentially leading to better delay predictions Enriched Probabilistic Models using ConceptNet 5
  112. Take Away • It is all about the human – not computing, not device – Computing for human experience • Whatever we do in Smart Data, focus on human- in-the-loop (empowering machine computing!): – Of Human, By Human, For Human – But in serving human needs, there is a lot more than what current big data analytics handle – variety, contextual, personalized, subjective, spanning data and knowledge across P-C-S dimensions 118
  113. Acknowledgements • Kno.e.sis team • Funds: NSF, NIH, AFRL, Industry… • Note: • For images and sources, if not on slides, please see slide notes • Some images were taken from the Web Search results and all such images belong to their respective owners, we are grateful to the owners for usefulness of these images in our context. 119
  114. • OpenSource: http://knoesis.org/opensource • Showcase: http://knoesis.org/showcase • Vision: http://knoesis.org/node/266 • Publications: http://knoesis.org/library 120 References and Further Readings
  115. Thanks … 121
  116. 122 Physical Cyber Social Computing Amit Sheth, Kno.e.sis, Wright State
  117. Amit Sheth’s PHD students Ashutosh Jadhav Hemant Purohit Vinh Nguyen Lu Chen Pavan Kapanipathi Pramod Anantharam Sujan Perera Alan Smith Pramod Koneru Maryam Panahiazar Sarasi Lalithsena Cory Henson Kalpa Gunaratna Delroy Cameron Sanjaya Wijeratne Wenbo Wang Kno.e.sis in 2012 = ~100 researchers (15 faculty, ~50 PhD students)
  118. 124 thank you, and please visit us at http://knoesis.org Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA Smart Data

Editor's Notes

  1. http://www.knowledgeinfusion.com/blog/2011/11/get-your-head-out-of-the-clouds-and-into-big-data/
  2. http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explodehttp://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
  3. Types of DataFormats of DataAlso talk about the increase in the platforms that helps generating these data
  4. Example high velocity Big Data applications at work:financial services, stock brokerage, weather tracking, movies/entertainment and online retail.Fast data (rate at which data is coming: esp from mobile, social and sensor sources), Rapid changes – in the data content, Stream analysis – to cope with the incoming data for real-time online analytics
  5. Source: http://techcrunch.com/2012/10/27/big-data-right-now-five-trendy-open-source-technologies
  6. http://radhakrishna.typepad.com/rks_musings/2013/04/big-data-review.htmlGoogle predicted the spread of flu in real time - after analyzing two datasets, a.) 50 million most common terms that Americans type, b.) data on the spread of seasonal flu from public health agency- tested a mammoth of 450 million different mathematical models to test the search terms, comparing their predictions against the actual flu cases- model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system (Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013)
  7. Better Algorithms Beat More Data — And Here’s Whyhttp://allthingsd.com/20121128/better-algorithms-beat-more-data-and-heres-why/Big Data Cannot Replace Human Judgmenthttp://www.matchcite.com/blog/blog/2012/july/big-data-cannot-replace-human-judgment.aspx**Comments about the articles
  8. Smart data makes sense out of big data – it provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, to provide actionable information and improve decision making.
  9. - HUMAN CENTRIC!!
  10. Information is CREATED by human with the Machinery available – Wikipedia tool, sensors and social networksInformation is STORED in Man+Machine readable format, LODInformation is PROCESSED using the LOD and Human assisted Knowledge-basedHigher level abstraction on info is now consumed in many mechanistic ways (including GIS) to provide EXPERIENCE for humans
  11. All the data related to human activity, existence and experiencesMore on PCS Computing: http://wiki.knoesis.org/index.php/PCS
  12. Information is CREATED by human with the Machinery available – Wikipedia tool, sensors and social networksInformation is STORED in Man+Machine readable format, LODInformation is PROCESSED using the LOD and Human assisted Knowledge-basedHigher level abstraction on info is now consumed in many mechanistic ways (including GIS) to provide EXPERIENCE for humans Example of a human guided modeling and improved performancehttp://research.microsoft.com/en-us/um/people/akapoor/papers/IJCAI%202011a.pdf
  13. Also, we have weather application which performs abstraction on weather sensory observations to identify blizzard conditions (food for actions!!) :--20,000 weather stations (with ~5 sensors per station)-- Real-Time Feature Streams - live demo: http://knoesis1.wright.edu/EventStreams/ - video demo: https://skydrive.live.com/?cid=77950e284187e848&amp;sc=photos&amp;id=77950E284187E848%21276
  14. Lets find it..
  15. Starting slide Various Big data problems – Traditional examples vs what we are doing examples. Variety and Velocity than Volume. kHealth problem. People will be interested in Smart Data.Traditional ML techniques, High Performance Computing, Statistics. Human level of Abstraction is Smart data.
  16. http://www.huffingtonpost.com/2012/10/30/hurricane-sandy-power-outage-map-infographic_n_2044411.htmlI would like to start with a motivational example here.
  17. http://www.guardian.co.uk/news/datablog/2012/oct/31/twitter-sandy-floodinghttp://www.huffingtonpost.com/2012/11/02/twitter-hurricane-sandy_n_2066281.htmlhttp://mashable.com/2012/10/31/hurricane-sandy-facebook/We in our lab have quite a bit of Social Data Research going on. So I would like to focus on the use of social networks during these disasters/crisis.Twitter and Facebook are massively used during disasters. During Hurricane Sandy there were …Not only this a major outbreak of tweets were during Japan earthquake which crossed more that 2000 tweets/sec.So why do people intend to use social networks to this extent during disasters.
  18. Fraustino, Julia Daisy, Brooke Liu and Yan Jin. “Social Media Use during Disasters: A Review of the Knowledge Base and Gaps,” Final Report to Human Factors/Behavioral Sciences Division, Science and Technology Directorate, U.S. Department of Homeland Security. College Park, MD: START, 2012. Disaster communication deals with disaster information disseminated to the public by governments, emergency management organizations, and disaster responders as well as disaster information created and shared by journalists and the public. Disaster communication increasingly occurs via social media in addition to more conventional communication modes such as traditional media (e.g., newspaper, TV, radio) and word-of-mouth (e.g., phone call, face-to-face, group). Timely, interactive communication and user-generated content are hallmarks of social media, which include a diverse array of web- and mobile-based tools Disaster communication deals with (1) disaster information disseminated to the public by governments, emergency management organizations, and disaster responders often via traditional and social media; as well as (2) disaster information created and shared by journalists and affected members of the public often through word-of-mouth communication and social media. For information seeking. Disasters often breed high levels of uncertainty among the public (Mitroff, 2004), which prompts them to engage in heightened information seeking, (Boyle, Schmierbach, Armstrong, &amp; McLeod, 2004; Procopio &amp; Procopio, 2007). As expected, information seeking is a primary driver of social media use during routine times and during disasters (Liu et al., in press; PEW Internet, 2011). For timely information. Social media provide real-time disaster information, which no other media can provide (Kavanaugh et al., 2011; Kodrich &amp; Laituri, 2011). Social media can become the primary source of time-sensitive disaster information, especially when official sources provide information too slowly or are unavailable (Spiro et al., 2012). For example, during the 2007 California wildfires, the public turned to social media because they thought journalists and public officials were too slow to provide relevant information about their communities (Sutton, Palen, &amp; Shklovski, 2008). Time-sensitive information provided by social media during disasters is also useful for officials. For example, in an analysis of more than 500 million tweets, Culotta (2010) found Twitter data forecasted future influenza rates with high accuracy during the 2009 pandemic, obtaining a 95% correlation with national health statistics. Notably, the national statistics came from hospital survey reports, which typically had a lag time of one to two weeks for influenza reporting. For unique information. One of the primary reasons the public uses social media during disaster is to obtain unique information (Caplan, Perse, &amp; Gennaria, 2007). Applied to a disaster setting, which is inherently unpredictable and evolving, it follows that individuals turn to whatever source will provide the newest details. Oftentimes, individuals experiencing the event first-hand are on the scene of the disaster and can provide updates more quickly than traditional news sources and disaster response organization. For instance, in the Mumbai terrorist attacks that included multiple coordinated shootings and bombings across two days, laypersons were first to break the news on Twitter (Merrifield &amp; Palenchar, 2012). Research participants report using social media to satisfy their need to have the latest information available during disasters and for information gathering and sharing during disasters (Palen, Starbird, Vieweg, &amp; Hughes, 2010; Vieweg, Hughes, Starbird, &amp; Palen, 2010). For unfiltered information. To obtain crisis information, individuals often communicate with one another via social media rather than seeking a traditional news source or organizational website (Stephens &amp; Malone, 2009). The public check in with social media not only to obtain up-to-date, timely information unavailable elsewhere, but also because they appreciate that information may be unfiltered by traditional media, organizations, or politicians (Liu et al., in press).  To determine disaster magnitude. The public uses social media to stay apprised of the extent of a disaster (Liu et al., in press). They may turn to governmental or organizational sources for this information, but research has shown that if the public do not receive the information they desire when they desire it, they, along with others, will fill in the blanks (Stephens &amp; Malone, 2009), which can create rumors and misinformation. On the flipside, when the public believed that officials were not disseminating enough information regarding the size and trajectory of the 2007 California wildfires, they took matters into their own hands, using social media to track fire locations in real-time and notify residents who were potentially in danger (Sutton, Palen, &amp; Shklovski, 2008).  To check in with family and friends. While Americans predominately use social media to connect with family and friends (PEW Internet, 2011), during disasters those connections may shift. For those with family or friends directly involved with the disaster, social media can provide a way to ensure safety, offer support, and receive timely status updates (Procopio &amp; Procopio, 2007; Stephens &amp; Malone, 2009). In a survey of 1,058 Americans, the American Red Cross (2010) found that nearly half of their respondents would use social media to let loved ones know they are safe during disasters. After the 2011 earthquake and tsunami in Japan, the public turned to Twitter, Facebook, Skype, and local Japanese social networks to keep in touch with loved ones while mobile networks were down (Gao, Barbier, &amp; Goolsby, 2011). Researchers also note that disasters may enhance feelings of affection toward family members, and indeed survey participants reported expressing more positive emotions toward their loved ones than usual as a result of the September 11 terrorist attacks, even if they were not directly impacted by the disaster (Fredrickson et al., 2003). Finally, disasters can motivate the public to reconnect with family and friends via social media (Procopio &amp; Procopio, 2009; Semaan &amp; Mark, 2012).  To self-mobilize. During disasters, the public may use social media to organize emergency relief and ongoing assistance efforts from both near and afar. In fact, one research group dubbed those who surge to the forefront of digital and in-person disaster relief efforts as “voluntweeters” (Starbird &amp; Palen, 2011). Other research documents the role of Facebook and Twitter in disaster relief fundraising (Horrigan &amp; Morris, 2005; PEJ, 2010). Research also reveals how social media can help identify and respond to urgent needs after disasters. For example, just two hours after the 2010 Haitian earthquake Tufts University volunteers created Ushahidi-Haiti, a crisis map where disaster survivors and volunteers could send incident reports via text messages and tweets. In less than two weeks, 2,500 incident reports were sent to the map (Gao, Barbier, &amp; Gollsby, 2011).  To maintain a sense of community. During disasters the media in general and social media in particular may provide a unique gratification: sense of community. That is, as the public logs in online to share their feelings and thoughts, they assist each other in creating a sense of security and community, even when scattered across a vast geographical area (Lev-On, 2011; Procopio &amp; Procopio, 2007). As Reynolds and Seeger (2012) observed, social media create communities during disasters that may be temporary or may continue well into the future.  To seek emotional support and healing. Finally, disasters are often inherently tragic, prompting individuals to seek not only information but also human contact, conversation, and emotional care (Sutton et al., 2008). Social media are positioned to facilitate emotional support, allowing individuals to foster virtual communities and relationships, share information and feelings, and even demand resolution (Choi &amp; Lin, 2009; Stephens &amp; Malone, 2009). Indeed, social media in general and blogs in particular are instrumental for providing emotional support during and after disasters (Macias, Hilyard, &amp; Freimuth, 2009; PEJ New Media Index, 2011). Additionally, social media in general and Twitter in particular can aid healing, as research finds during both natural disasters, such as Hurricane Katrina (Procopio &amp; Procopio, 2007), and man-made disasters, such as the July 2011 attacks in Oslo, Norway (Perng et al., 2012).
  19. http://www.buzzfeed.com/annanorth/how-social-media-is-aiding-the-hurricane-sandy-rec -- Facebook help during Hurricane Sandyhttp://blog.twitter.com/2012/10/hurricane-sandy-resources-on-twitter.html – Twitter page for Hurricane Sandyhttp://www.treehugger.com/culture/12-ways-help-hurricane-sandy-relief-efforts.htmlCategorization of severity based on weather conditions. Actionable information is contextually dependent.
  20. http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_USLet me consider one small example of how social data (in turn data) can help people during disasters. Data becomes smart data if it takes recipient into account - context.Sensor data for emergency responders. Who in the population needs immediate attention (1) Location (2) Severity (3) Health Condition Need for abstraction. – Semantic Perception needs abstraction. 90 + Heart Problem  Don’t run out23  Run out
  21. http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_UShttp://www.internews.org/sites/default/files/resources/InternewsEurope_Report_Japan_Connecting%20the%20last%20mile%20Japan_2013.pdfLet me consider one small example of how social data (inturn data) can help people during disasters. Data becomes smart data if it takes recipient into account and changes contact accordingly.Sensor data for emergency responders. Who in the population needs immidiate attention (1) Location (2) Severity (3) Health Condition Need for abstraction. – Semantic Perception needs abstraction. 90 + Heart Problem  Don’t run out23  Run out
  22. http://news.cnet.com/8301-1023_3-57541566-93/report-twitter-hits-half-a-billion-tweets-a-day/http://semiocast.com/en/publications/2012_07_30_Twitter_reaches_half_a_billion_accounts_140m_in_the_USLet me consider one small example of how social data (inturn data) can help people during disasters. Data becomes smart data if it takes recipient into account and changes contxt accordingly.Sensor data for emergency responders. Who in the population needs immediate attention (1) Location (2) Severity (3) Health Condition Need for abstraction. – Semantic Perception needs abstraction. 90 + Heart Problem  Don’t run out23  Run out
  23. http://www.buzzfeed.com/jackstuef/the-man-behind-comfortablysmug-hurricane-sandysDuring the storm last night, user @comfortablysmug was the source of a load of frightening but false information about conditions in New York City that spread wildly on Twitter and onto news broadcasts before Con Ed, the MTA, and Wall Street sources had to take time out of the crisis situation to refute them.
  24. Although we face challenges like these with data everytime. The most important thing is what you aim to do with the data. I mean what value do you intend to provide from the data
  25. http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/
  26. http://www.wired.com/insights/2013/04/big-data-fast-data-smart-data/
  27. -- Contextual Questioning – Potential Information needed from Humans
  28. Larry Smarr is a professor at the University of California, San DiegoAnd he was diagnosed with Crones DiseaseWhat’s interesting about this case is that Larry diagnosed himselfHe is a pioneer in the area of Quantified-Self, which uses sensors to monitor physiological symptomsThrough this process he discovered inflammation, which led him to discovery of Crones DiseaseThis type of self-tracking is becoming more and more common
  29. Massive amount of data will be collected by sensors and mobile devices yet patients and doctors care about “actionable” information.This data has all the four Vs of big data and we used knowledge enabled techniques to transform it into valueIn the context of PD, we analyzed massive amount of sensor data collected by sensors on a smartphones to understand detection and characterization of PD severity.
  30. Main idea: Prior knowledge of PD was used to facilitate its detection from massive sensor data by reducing the search spaceDetails:Declarative knowledge of PD includes PD severity and their symptoms as shown in the logical rule aboveEach PD severity level is a conjunction of a set of PD symptomsEach symptom was mapped to its manifestation in sensor observationsThe availability of declarative knowledge significantly improved the analytics by aiding feature selection processThe graphs above contrasts the physical movements and voice of two control group members and two PD patients
  31. kHealth:http://www.youtube.com/watch?v=btnRi64hJp4EventShop:*http://www.slideshare.net/jain49/eventshop-120721, http://dl.acm.org/citation.cfm?id=2488175
  32. - what if we could automate this sense making ability?- and what if we could do this at scale?
  33. sense making based on human cognitive models
  34. perception cycle contains two primary phasesexplanationtranslating low-level signals into high-level abstractions inference to the best explanationdiscriminationfocusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
  35. perception cycle contains two primary phasesexplanationtranslating low-level signals into high-level abstractions inference to the best explanationdiscriminationfocusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
  36. A single-feature (disease) assumption means that all the observed properties (symptoms) must be explained by a single feature.i.e., this framework is not expressive enough to model comorbidity where there may be more than one feature (disease) co-existing For example, if there are two diseases causing disjoint symptoms, and all the symptoms of both the diseases are observed, then this framework will not be able to find the coverage and returns no diseases.
  37. perception cycle contains two primary phasesexplanationtranslating low-level signals into high-level abstractions inference to the best explanationdiscriminationfocusing attention on those properties that will help distinguish between multiple possible explanationsused to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
  38. - With this ability,many problems could be solved- For example: we could help solve health problems (before they become serious health problems) through monitoring symptoms and real-time sense making, acting as an early warning system to detect problematic health conditions
  39. Intelligence distributed at the edge of the networkRequires resource-constrained devices (mobile phones, gateway notes, etc.) to be able to utilize SW technologies
  40. Intelligence distributed at the edge of the networkRequires resource-constrained devices (mobile phones, gateway notes, etc.) to be able to utilize SW technologiesHenson et al. &apos;An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
  41. compute machine perception inferences -- i.e., explanation and discrimination -- of high-complexity on a resource-constrained devices in milisecondsDifference between the other systems and what this system provides
  42. Intelligence at the age. Shipping computation and domain models to the edge (Distributed)
  43. “to help software reusability in order to allow new applications to be built faster and to share innovations (software components, novel approaches) amongst software developers” “to standardize and commoditize back-end data stores so client software may access any Open mHealth-compliant data store in a uniform way (interoperability)” “to produce examples and documentation of these concepts meaningfully and simply”
  44. Observe data from different sensors at the same time.
  45. System Architecture Fig. shows an overview of the SemHealth architecture. SensorsAll are bluetooth sensors already utilized by the current k-Health application to measure weight, heart rate, and blood pressureAndroid applicationReads sensor observations through bluetoothPerforms annotation on observations and generates percepts from those observationsUploads annotated observations and percepts to the server-side data storeRetrieves data using DSU API and feeds data to DPU and/or DVU APIsVisualizes data through DVU APIConsidered a “nice to have” as existing visualization may be used as-isWill utilize existing graphing library for Android with Open mHealth-style API that may be translated to browser at a later timeServer-sideOpen mHealth compliant DSU and DPU APIsTriple data storage replaces existing SQLite database in k-Health applicationExisting k-Health reasoner now the brains behind DPU
  46. http://www.flickr.com/photos/twitteroffice/5897088517/sizes/o/in/photostream/http://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://bayarea.sbnation.com/49ers/2013/2/3/3947738/super-bowl-prop-bets-2013-twitterhttp://expandedramblings.com/index.php/march-2013-by-the-numbers-a-few-amazing-twitter-stats/
  47. Much of the early work in Big data is being done with focusing on uni-directional among XYZ.
  48. http://semanticweb.com/picking-the-president-twindex-twitris-track-social-media-electorate_b31249http://semanticweb.com/election-2012-the-semantic-recap_b33278
  49. http://knoesis.wright.edu/library/resource.php?id=1787
  50. Categorization of severity based on weather conditions. Actionable information is contextually dependent.
  51. - 1 (+half) minuteAlright, so let’s motivate by this situation during emergency - Various actors: resource seekers, responder teams, resource providers at remote siteAnd - each of these actor groups have questions --- - needs - providers - responders: wondering!Here we have social network to connect these actors and bridge the gap for communication platformBut it’s potential use is yet to be realized for effective helpBecause.. (next slide)
  52. Talk about what kind of smart data we provide that helps the actions of crisis response coordination.
  53. Source: Purohit et. al 2013 (https://docs.google.com/a/knoesis.org/document/d/1aBJ2egHICUwaWxR8jOoTIUfEYj1QAnUt0q7haIKoYGY/edit# , http://www.knoesis.org/library/resource.php?id=1865)
  54. http://twitris.knoesis.org/oklahomatornado
  55. (It is real-time widget for monitoring of needs, so will not be active after the event has passed) http://twitris.knoesis.org/oklahomatornado
  56. Highly rich interface for response team
  57. Definition of the event US Elections and some changes/subevents --- Primaries --- Debates -- People/Places/Organizations involved in the eventArab Spring -- Subevents during those -- Egypt protests
  58. Explain about continuous semantics
  59. Pucher, J., Korattyswaroopam, N., &amp; Ittyerah, N. (2004). The crisis of public transport in India: Overwhelming needs but limited resources. Journal of Public Transportation, 7(4), 1-30.
  60. Point of this slide: correlations
  61. Point of this slide: heterogeneity and uncertainty
  62. A single observation of slow moving traffic may have multiple explanations.
  63. Internal observations are limited to whatever the on-road sensors can observe. In the 511.org data we have analyzed, the internal observations are mentioned above.External events are obtained from sources beyond the on-road sensors e.g., some agency like 511.org which reports traffic incidents.Note that: Internal observations are mostly machine sensors External events are mostly textual observationsThe analogy in healthcare will be:Internal observations: on body sensors such as heart rate, temperatureExternal events: jogging, walking, taking stairs
  64. e.g. equation for projectile motion may not precisely compute the actual projectile. Air resistance may have been ignored
  65. Used of open data for parameters is promising and can be explored as future research.
  66. Some facts about the domain of traffic got from Conceptnet5The types of events are obtained by using the comprehensive subsumption relationship from 511.orgWe propose to use such a knowledge in complementing the PGM structure learning algorithmsCapableOf(traffic jam, occur twice each day)CapableOf(traffic jam, slow traffic)RelatedTo(accident, car crash)Causes(road ice, accident)CapableOf(bad weather, slow traffic)HasSubevent(go to concert, car crash)Causes(go to baseball game, traffic jam)Causes(traffic accident, traffic jam)BadWeather(road ice)BadWeather(bad weather)ScheduledEvent(go to concert)ScheduledEvent(go to baseball game)ActiveEvent(traffic accident)Delay(slow traffic)Delay(traffic jam)TimeOfDay(occur twice each day)
  67. Declarative knowledge + statistical correlationThis slide illustrates the three operations to enrich the correlation structure extracted using statistical methods These operations utilize declarative knowledge form ConceptNet5 as shown in each step
  68. Statistical correlation structure shown aboveThe enriched structure is shown belowThe enrichment of the graphical model will potentially allow us to capture the domain precisely and also improve our prediction as the model would get closer to the underlying probabilistic distribution in the real-worldLog-Likelihood score is one way of quantifying how good a structure is based on the observed data There may be many candidate structures extracted from data which result in the log likelihood scoreDeclarative knowledge will help us ground statistical models to reality which will allow us to pick one structure over the other Pramod Anantharam, KrishnaprasadThirunarayan and AmitSheth, &apos;Traffic Analytics using Probabilistic Graphical Models Enhanced with Knowledge Bases,&apos; 2nd International Workshop on Analytics for Cyber-Physical Systems (ACS-2013) at SIAM International Conference on Data Mining (SDM13), pp. 13--20, Texas, USA, May 2-4, 2013.We stopped at structure extraction for our workshop paper (SIAM ACS workshop) since the declarative knowledge we used (ConceptNet5) and statistical model (nodes and edges) are at the same level of abstraction
  69. More at: http://wiki.knoesis.org/index.php/PCSAnd http://knoesis.org/projects/ssw/
Advertisement