Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

5,914 views

Published on

Presentation at the AAAI 2013 Fall Symposium on Semantics for Big Data, Arlington, Virginia, November 15-17, 2013

Additional related material at: http://wiki.knoesis.org/index.php/Smart_Data
Related paper at: http://www.knoesis.org/library/resource.php?id=1903

Abstract: We discuss the nature of Big Data and address the role of semantics in analyzing and processing Big Data that arises in the context of Physical-Cyber-Social Systems. We organize our research around the five V's of Big Data, where four of the Vs are harnessed to produce the fifth V - value. To handle the challenge of Volume, we advocate semantic perception that can convert low-level observational data to higher-level abstractions more suitable for decision-making. To handle the challenge of Variety, we resort to the use of semantic models and annotations of data so that much of the intelligent processing can be done at a level independent of heterogeneity of data formats and media. To handle the challenge of Velocity, we seek to use continuous semantics capability to dynamically create event or situation specific models and recognize new concepts, entities and facts. To handle Veracity, we explore the formalization of trust models and approaches to glean trustworthiness. The above four Vs of Big Data are harnessed by the semantics-empowered analytics to derive Value for supporting practical applications transcending physical-cyber-social continuum.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

  1. 1. Semantics-empowered Big Data Processing for PCS Applications Krishnaprasad Thirunarayan (T. K. Prasad) and Amit Sheth Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435
  2. 2. Outline • 5 V’s of Big Data Research • Semantic Perception for Scalability • Lightweight semantics to manage heterogeneity – Cost-benefit trade-off and continuum • Hybrid Knowledge Representation and Reasoning – Anomaly, Correlation, Causation 211/15/2013 Prasad
  3. 3. 5V’s of Big Data Research Volume Velocity Variety Veracity Value 11/15/2013 Prasad 3 Big Data => Smart Data
  4. 4. Volume : Assorted Examples Check engine light analogy 11/15/2013 Prasad 4
  5. 5. Volume : Semantic Perception 11/15/2013 Prasad 5
  6. 6. Weather Use Case 11/15/2013 Prasad 6
  7. 7. Parkinson’s Disease Use Case 11/15/2013 Prasad 7
  8. 8. Heart Failure Use Case 11/15/2013 Prasad 8
  9. 9. Asthma Use Case 11/15/2013 Prasad 9
  10. 10. Traffic Use Case 11/15/2013 Prasad 10
  11. 11. Slow moving traffic Link Description Scheduled Event Scheduled Event 511.org 511.org Schedule Information 511.org Traffic Monitoring 11 Heterogeneity in a Physical-Cyber-Social System
  12. 12. Volume with a Twist Resource-constrained reasoning on mobile- devices 11/15/2013 Prasad 12
  13. 13. * based on Neisser’s cognitive model of perception Observe Property Perceive Feature Explanation Discrimination 1 2 Perception Cycle* that exploits background knowledge / domain models Abstracting raw data for human comprehension Focus generation for disambiguation and action (incl. human in the loop) Prior Knowledge 13
  14. 14. Virtues of Our Approach to Semantic Perception Blends simplicity, effectiveness, and scalability. • Declarative specification of explanation and discrimination; • With applications (e.g., to healthcare) that are of contemporary relevance and interdisciplinary; • Using encodings/algorithms that are significant (asymptotic order of magnitude gain) and necessary (“tractable” due to time/memory reduction for typical problem sizes); and • Prototyped using extant PCs and mobile devices.
  15. 15. O(n3) < x < O(n4) O(n) Efficiency Improvement • Problem size increased from 10’s to 1000’s of nodes • Time reduced from minutes to milliseconds • Complexity growth reduced from polynomial to linear Evaluation on a mobile device 15
  16. 16. Volume and Velocity • Lightweight semantics-based Adaptive/Continuous Filtering Disaster response use-case • Building domain models dynamically 11/15/2013 Prasad 16
  17. 17. Dynamic Model Creation Continuous Semantics 17
  18. 18. Variety Syntactic and semantic heterogeneity • in textual and sensor data, • in (legacy) materials data • in (long tail) geosciences data 11/15/2013 Prasad 18
  19. 19. Variety (What?): Materials/Geosciences Use Case • Structured Data (e.g., relational) • Semi-structured, Heterogeneous Documents (e.g., Publications and technical specs, which usually include text, numerics, maps and images) • Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries) 1911/15/2013 Prasad
  20. 20. Variety (How?/Why?): Granularity of Semantics & Applications • Lightweight semantics: File and document-level annotation to enable discovery and sharing • Richer semantics: Data-level annotation and extraction for semantic search and summarization • Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Data Cost-benefit trade-off and continuum 20
  21. 21. Challenges Associated with Typical Spreadsheet/Table • Meant for human consumption • Irregular : – Not simple rectangular grid • Heterogeneous – All rows not interpreted similarly • Complex – Meaning of each row and each column context dependent • Footnotes modify meaning of entries (esp. in materials and process specifications) 2111/15/2013 Prasad
  22. 22. 22
  23. 23. Practical Semi-Automatic Content Extraction • DESIGN: Develop regular data structures that can be used to formalize tabular information. – Provide a natural expression of data – Provide semantics to data, thereby removing potential ambiguities – Enable automatic translation • USE: Manual population of regular tables and automatic translation into LOD 2311/15/2013 Prasad
  24. 24. Variety (What?) : Sensor Data Use Case Develop/learn domain models to exploit complementary and corroborative information • To relate patterns in multimodal data to “situation” • To integrate machine sensed and human sensed data 11/15/2013 Prasad 24
  25. 25. Variety: Hybrid KRR Blending data-driven models with declarative knowledge – Data-driven: Bottom-up, correlation- based, statistical – Declarative: Top- down, causal/taxonomical, logical – Refine structure to better estimate parameters E.g., Traffic Analytics using PGMs + KBs 11/15/2013 Prasad 25
  26. 26. Variety (Why?): Hybrid KRR Data can help compensate for our overconfidence in our own intuitions and reduce the extent to which our desires distort our perceptions. -- David Brooks of New York Times However, inferred correlations require clear justification that they are not coincidental, to inspire confidence. 11/15/2013 Prasad 26
  27. 27. • Correlations due to common cause or origin • Coincidental due to data skew or misrepresentation • Coincidental new discovery • Strong correlation vs causation • Anomalous and accidental • Correlation turning into causations Correlations vs Causation vs Anomalies 11/15/2013 Prasad 27
  28. 28. • Correlations Due to common cause or origin – E.g., Planets: Copernicus > Kepler > Newton > Einstein • Coincidental due to data skew or misrepresentation – E.g., Tall policy claims made by politicians! • Coincidental new discovery – E.g., Hurricanes and Strawberry Pop-Tarts Sales • Strong correlation vs causation – E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers • Anomalous and accidental – E.g., CO2 levels and Obesity • Correlation turning into causations – E.g., Pavlovian learning: conditional reflex Correlations vs Causation vs Anomalies 11/15/2013 Prasad 28
  29. 29. • Correlations Due to common cause or origin – E.g., Planets: Copernicus > Kepler > Newton > Einstein • Coincidental due to data skew or misrepresentation – E.g., Tall policy claims made by politicians! • Coincidental new discovery – E.g., Hurricanes and Strawberry Pop-Tarts Sales • Strong correlation vs causation – E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers • Anomalous and accidental – E.g., CO2 levels and Obesity • Correlation turning into causations – E.g., Pavlovian learning: conditional reflex Correlations vs Causation vs Anomalies 11/15/2013 Prasad 29
  30. 30. Veracity Lot of existing work on Trust ontologies, metrics and models, and on Provenance tracking • Homogeneous data: Statistical techniques • Heterogeneous data: Semantic models 11/15/2013 Prasad 30
  31. 31. Veracity Machine sensing: objective, quantitative, but prone to environmental effects, battery life, … Human sensing: subjective, qualitative, but prone to bias, perceptual errors, rumors, … Open problem: Improving trustworthiness by combining machine sensing and human sensing – E.g., 2002 Überlingen mid-air collision :Pilot incorrectly using Traffic controller advice over electronic TCAS system recommendation 11/15/2013 Prasad 31
  32. 32. (More on) Value Learning domain models from “big data” for prediction E.g., Harnessing Twitter "Big Data" for Automatic Emotion Identification 11/15/2013 Prasad 32
  33. 33. (More on) Value Discovering gaps and enriching domain models using data E.g., Data driven knowledge acquisition method for domain knowledge enrichment in the healthcare 11/15/2013 Prasad 33
  34. 34. Conclusions • Glimpse of our research organized around the 5 V’s of Big Data • Discussed role in harnessing Value – Semantic Perception (Volume) – Continuum of Semantic models to manage Heterogeneity (Variety) – Hybrid KRR: Probabilistic + Logical (Variety) – Continuous Semantics (Velocity) – Trust Models (Veracity) 3411/15/2013 Prasad
  35. 35. 35 thank you, and please visit us at http://knoesis.org/ Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, Ohio, USA Kno.e.sis 11/15/2013 Prasad Special Thanks to: Pramod Anantharam and Cory Henson

×