Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Stream reasoning: an approach to tame the velocity and variety dimensions of Big Data

161 views

Published on

Big Data tech can tame volume and velocity. Taming Variety in presence of volume and velocity is the real challenge. I’ve been working on taming variety and velocity simultaneously (Stream Reasoning) for 10 years, now. In this talk, I give you some examples of application domains where this is necessary. I explain where the Stream Reasoning community went so far in theory, applications and products. In particular I focus on my applications and my startup Fluxedo, which is offering real-time social media analytics across social networks. I conclude the talk discussing what comes next: 1) the need to focus on languages and abstractions able to easily capture user needs; 2) the need to find the sweet-spot between scalability and expressive semantics; 3) the need to used semantics to model more than the data access; and 4) the need to get over imperfect data. If you are exited, I did my job for today!

Published in: Data & Analytics
  • Be the first to comment

Stream reasoning: an approach to tame the velocity and variety dimensions of Big Data

  1. 1. STREAM REASONING AN APPROACH TO TAME THE VELOCITY AND VARIETY DIMENSIONS OF BIG DATA Emanuele Della Valle
 Politecnico di Milano
 http://emanueledellavalle.org
 @manudellavalle Oslo, Norway - 15.6.2017
  2. 2. BIG DATA TECHS CAN TAME VOLUME ▸ Hadoop, MapReduce, HIVE ▸ “schema on read” methodology ▸ spark (x100 faster) ▸ “data lake” concept Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  3. 3. BIG DATA TECHS CAN TAME VELOCITY ▸ Storm ▸ Kafka ▸ Spark Streaming ▸ Flink ▸ paradigmatic change ▸ from persistent data and transient queries ▸ to persistent queries and transient data Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  4. 4. BIG DATA TECHS CANNOT TAME VOLUME AND VELOCITY SIMULTANEOUSLY ZB EB PB TB GB MB KB months days hours min. sec. ms. Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  5. 5. BIG DATA TECHS CAN TAME VARIETY USING SEMANTIC TECHNOLOGIES ▸ RDF data model ▸ SPARQL query language ▸ OWL ontological language ▸ R2RML mapping language ▸ Ontology Based Data Access methodology Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  6. 6. BIG DATA TECHS VARIETY MAKES PROBLEMS HARDER ZB EB PB TB GB MB KB months days hours min. sec. ms. VARIETY STILL THERE ARE USERS WHOSE DECISIONS 
 NEED TO TAME ALL Vs Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  7. 7. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs OFF-SHORE OIL OPERATIONS ‣ When sensors on a drilling pipe in an oil-rig indicate that it is about to get stuck, how long — according to historical records — can I keep drilling? ‣ 400,000 sensors from 10s of differente producers ‣ 10,000 observations per second, many out-of-operational-ranges
  8. 8. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs SMART CITIES ▸ Can you suggest where to spend my next hours given my interests, 
 the presence of people and what their doing? ▸ 100,000s people generating 10,000s information items per second
 Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  9. 9. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs SOCIAL MEDIA ANALYSIS ▸ Who are the current top influencer users that are driving the discussion about the top emerging topics across all the social networks ▸ billions of active users (facebook, 1.86 bln in February 2017) ▸ millions of actions (facebook, 2.92 mln post per minute) Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  10. 10. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs REQUIREMENT ANALYSIS A system able to answer those queries must be able to ▸ handle massive datasets x ▸ process data streams on the fly x ▸ cope with heterogeneous datasets x ▸ cope with incomplete data x x ▸ cope with noisy data x ▸ provide reactive answers x ▸ support fine-grained information access x x ▸ integrate complex domain models x Volume Velocity Variety VERACITY Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  11. 11. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs (PARTIAL) SOLUTIONS: STREAM PROCESSING ▸ A paradigmatic change! window input streams streams of answerRegistered Continuous Query Dynamic System Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  12. 12. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs STREAM PROCESSING VS. REQUIREMENTS Requirement SP massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  13. 13. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs (PARTIAL) SOLUTIONS: SEMANTIC TECHS ▸ Given an ontology O (an information model), a query Q and 
 a set of ground facts A contained in multiple heterogenous databases …, ▸ use O to rewrite Q as Q’ so that ▸ answer(Q,O,A) = answer(Q’,!,A)
 The answer of the query Q using the ontology O for any set of ground facts A is equal to answer of a query Q’ without considering the ontology O ▸ Use mapping M to map Q’ to multiple SQL queries to the various databases Rewrite O Q Q’ Map SQL M answer A Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  14. 14. STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs SEMANTIC TECHS VS. REQUIREMENTS Requirement SP ST massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  15. 15. Is it possible to make sense in real time 
 of multiple, heterogeneous, gigantic and 
 inevitably noisy and incomplete data streams 
 in order to support the decision processes of extremely large numbers of concurrent users? E. Della Valle, S. Ceri, F. van Harmelen & H. Stuckenschmidt, 2010 STILL THERE ARE USERS WHOSE DECISIONS NEED TO TAME ALL Vs STREAM REASONING RESEARCH QUESTION Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  16. 16. ( , 13), ( , 12), ( , 8) , ( , 8) STREAM REASONING THEORY: STREAM PROCESSING time 1 minute wide window Which are the top-4 most frequent colours in the last minute? Is there a 
 followed by a 
 in the last minute yes, many Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  17. 17. STREAM REASONING THEORY: STREAM PROCESSING + SEMANTIC TECHS time 1 minute wide window An ontology of colours Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  18. 18. ( , 13),( , 8) , ( , 8) STREAM REASONING THEORY: STREAM REASONING time 1 minute wide window Which are the top-2 most frequent cool colours in the last minute? Is there a primary cool colour followed by a secondary warm one
 yes, followed by . An ontology of colours Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  19. 19. STREAM REASONING THEORY: STREAM REASONING time 1 minute wide window A better 
 ontology of colours Which are the most frequent sentiments in the last minute? Is there a impulsive, irritating colour followed by an happy one
 The better is the ontology of the colours we are using the more expressive are the queries we can register Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  20. 20. STREAM REASONING THEORY: 1000 SCIENTIFIC PAPERS IN 10 YEAR ▸ It is possible extend the Semantic Web stack in order 
 to represent heterogeneous data streams (RDF streams), continuous queries (C-SPARQL, CQELS-QL, … RSP-QL), and continuous reasoning (LARS, STARQL, …) tasks ▸ The ordered nature of data streams and the possibility to forget old enough information allow to optimise continuous querying (C-SPARQL Engine, CQELS, MorphStream, … RSP Engine) and continuous reasoning (IMaRS, RDFox, StreamRule, ETALIS…) tasks so to provide reactive answers ▸ Semantic Web and Machine Learning technologies can be jointly employed to cope with the noisy and incomplete nature of data streams Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  21. 21. Traditional STREAM REASONING THEORY: STREAM REASONING PARADIGMATIC CHANGE ENABLED TRADITIONAL APPROACH Data “in-motion” Data “in-motion” Registered analysis Insights “in-motion” Data put “at-rest” in DWH Analysis Analysis Insight PANOPTIQUE APPROACH Ontology + Mappings Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  22. 22. Traditional Stream Reasoning STREAM REASONING THEORY: STREAM REASONING PARADIGMATIC CHANGE ENABLED TRADITIONAL APPROACH Data “in-motion” Data “in-motion” Registered analysis Insights “in-motion” Data put “at-rest” in DWH Analysis Analysis Insight PANOPTIQUE APPROACH Ontology + Mappings Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  23. 23. STREAM REASONING (MY) APPLICATIONS BOTTARI Winner of 
 Semantic Web Challenge 2011 URBAN BIG DATA SCIENCE Winner of IBM faculty award 2013
 Funded by 8 EIT Digital yearly grants Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  24. 24. STREAM REASONING URBAN BIG DATA SCIENCE: CITYSENSING PROJECT
  25. 25. STREAM REASONING URBAN BIG DATA SCIENCE: CROWDINSIGHTS PROJECT October July 1000 Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  26. 26. STREAM REASONING PRODUCTS: I STARTED UP Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  27. 27. STREAM REASONING PRODUCTS: I STARTED UP Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  28. 28. STREAM REASONING PRODUCTS: I STARTED UP Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  29. 29. STREAM REASONING STREAM REASONING VS. REQUIREMENTS Requirement Stream Reasoning massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models not specifically treated so far treated but not resolved universally addressed by all studies Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  30. 30. STREAM REASONING NOW WHAT? ▸ Focus on languages and abstractions able to easily capture user needs ▸ Analytic queries ▸ Which electricity-producing turbine has sensor readings similar 
 (i.e., Pearson correlated by at least 0.75) to any turbine that subsequently had a critical failure in the past year? ▸ Advance analytics (Machine Learning) tasks ▸ Where am I likely going to run into a traffic jam during my commute tonight and how long will it take, given current weather and traffic conditions? ▸ … many more … Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  31. 31. ▸ Find the sweet-spot between scalability and expressive semantics ▸ the data access layers are clear (enough) ▸ … but, what kind of reasoning should we put at the top? ▸ Rule language? Answer set programming? Temporal logic? STREAM REASONING NOW WHAT? Complexity Raw Stream Processing Semantic Streams DL-Lite ???Abstraction Selection Interpretation Reasoning Re-writing Mapping Change Frequency PTIME NEXPTIME 104 Hz 1 Hz Complexity vs. Dynamics AC0 Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  32. 32. STREAM REASONING NOW WHAT? ▸ Used semantics to model more than the data access ▸ Data are imperfect, get over it!
  33. 33. STREAM REASONING ARE YOU INTERESTED TO LEARN MORE? ▸ the official stream reasoning community web site ▸ http://streamreasoning.org/ ▸ the RDF Stream Processing W3C community ▸ https://www.w3.org/community/rsp/ ▸ my personal pages ▸ http://emanueledellavalle.org/ + twitter: @manudellavalle ▸ my company page ▸ http://fluxedo.com/en/ Emanuele Della Valle - http://emanueledellavalle.org - @manudellavalle
  34. 34. STREAM REASONING THANK YOU! ANY QUESTION? Emanuele Della Valle
 Politecnico di Milano
 http://emanueledellavalle.org
 @manudellavalle Oslo, Norway - 15.6.2017

×