Stream Reasoning               Where We Got So Far                      Oxford - 2010.1.18             http://streamreason...
Agenda  •  Motivation  •  Running Example  •  Background  •  Concept  •  Achievements  •  Retrospective and Conclusions   ...
MotivationIt s a streaming World! [IEEE-IS2009]   •  Sensor networks, …   •  traffic engineering, …   •  social networking...
Running ExampleReal-Time Streams on the Web   •  Streams are appearing more and more often on the      Web in sites that d...
Running ExampleExamples of Questions Users are Asking   •  Which topics have my close friends discussed in the      last h...
MotivationProblem Statement   •  Making sense       –  in real time       –  of gigantic and inevitably noisy data streams...
BackgroundWhat are data streams anyway?   •  Formally:      –  Data streams are unbounded sequences of time-         varyi...
BackgroundContinuous Semantics   •  Processing data streams in the space of      one-time semantics is difficult      beca...
BackgroundStream Processing   •  Continuous queries registered over streams that      are observed trough windows         ...
BackgroundData Stream Management Systems (DSMS)   •  Research Prototypes      –    Amazon/Cougar (Cornell) – sensors      ...
BackgroundCan the Semantic Web process data stream?   •  The Semantic Web, the Web of Data is doing fine      –  RDF, RDF ...
ConceptStream Reasoning [IEEE-IS2010]   •  Idea origination      –  Can continuous semantics be ported to reasoning?      ...
ConceptResearch Challenges   •  Relation with data-stream systems      –  Just as RDF relates to data-base systems?   •  Q...
AchievementsExplored Continuous Semantics for SeWeb   •  We investigated      –  Architecture of a Stream Reasoner      – ...
AchievementsArchitecture (IEEE-IS2010)                                                                                    ...
AchievementsRDF Stream [WWW2009,EDBT2010,IJSC2010]   •  RDF Stream Data Type      –  Ordered sequence of pairs, where each...
AchievementsC-SPARQL [WWW2009,EDBT2010,IJSC2010]   •  We specificied of C-SPARQL syntax      –  Incrementally, from existi...
AchievementsAn Example of C-SPARQL Query   Who are the opinion makers? i.e., the users who are likely to influence    the ...
AchievementsAn Example of C-SPARQL Query   Who are the opinion makers? i.e., the users who are likely to influence        ...
AchievementsEfficiency of Evaluation 1/3 [IEEE-IS2010]   •  Evaluation of Window-based Selection    Oxford, 2011-1-18   Em...
AchievementsEfficiency of Evaluation 2/3 [EDBT2010]   •  Several transformations can be applied to algebraic      represen...
AchievementsEfficiency of Evaluation 3/3 [EDBT2010]   •  Push of filters and projections                125               ...
AchievementsExample of C-SPARQL and Reasoning 1/2   What impact have I been creating with my tweets in the last hour?   Is...
AchievementsExample of C-SPARQL and Reasoning 2/2                                                                         ...
AchievementsState-of-the-Art Approach [Ceri1994,Volz2005]   1.  Overestimation of deletion: Overestimates deletions       ...
Achievementsour approach [ESWC2010] 1/2   •  Assuption      –  Insertions and deletions are triples respectively         e...
Achievementsour approach [ESWC2010] 2/2   •  The algorithm      1.  deletes all triples (asserted or inferred) that have j...
AchievementsComparative Evaluation 1/2 [ESWC2010]   •  Hypothesis               –  Background knowledge do not change and ...
AchievementsComparative Evaluation 2/2   •  Comparison of the average time needed to answer a      C-SPARQL query using   ...
Retrospective and ConclusionsWrap Up   •  RDF Streams       –  Notion defined   •  C-SPARQL       –  Syntax and semantics ...
Retrospective and ConclusionsAchievements vs. Research Challenges   •  Relation with data-stream systems       –  Notion o...
References  •  Vision      [IEEE-IS2009] Emanuele Della Valle, Stefano Ceri, Frank van Harmelen, Dieter Fensel         Its...
Thank You! Questions?                                  Much More to Come!                                    Keep an eye o...
Upcoming SlideShare
Loading in...5
×

Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

1,169
-1

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,169
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Stream Reasoning - where we got so far 2011.1.18 Oxford Key Note

  1. 1. Stream Reasoning Where We Got So Far Oxford - 2010.1.18 http://streamreasoning.org Emanuele Della Valle DEI - Politecnico di Milano emanuele.dellavalle@polimi.it http://emanueledellavalle.org Joint work with:Davide Francesco Barbieri, Daniele Braga, Stefano http://wiki.larkc.eu/UrbanComputing • For more information visit Ceri, and Michael Grossniklaus
  2. 2. Agenda •  Motivation •  Running Example •  Background •  Concept •  Achievements •  Retrospective and Conclusions Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 2
  3. 3. MotivationIt s a streaming World! [IEEE-IS2009] •  Sensor networks, … •  traffic engineering, … •  social networking, … •  financial markets, … •  generate streams! Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 3
  4. 4. Running ExampleReal-Time Streams on the Web •  Streams are appearing more and more often on the Web in sites that distribute and present information in real-time streams. •  Checkout http://activitystrea.ms/ for a standard API •  E.g. Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 4
  5. 5. Running ExampleExamples of Questions Users are Asking •  Which topics have my close friends discussed in the last hour? •  Which book is my friend likely to read next? •  What impact have I been creating with my tweets in the last day? •  … •  <query> … <time dimension> ? Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 5
  6. 6. MotivationProblem Statement •  Making sense –  in real time –  of gigantic and inevitably noisy data streams –  in order to support the decision process of extremely large numbers of concurrent user Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 6
  7. 7. BackgroundWhat are data streams anyway? •  Formally: –  Data streams are unbounded sequences of time- varying data elements time •  Less formally: –  an (almost) continuous flow of information –  with the recent information being more relevant as it describes the current state of a dynamic system Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 7
  8. 8. BackgroundContinuous Semantics •  Processing data streams in the space of one-time semantics is difficult because of the very nature of the underlying data •  Innovative* assumption: continuous semantics! –  streams can be consumed on the fly rather than being stored forever and –  queries are registered and continuously produce answers * This innovation arose in DB community in 90s Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 8
  9. 9. BackgroundStream Processing •  Continuous queries registered over streams that are observed trough windows window input stream Registered   stream of answer Con-nuous   Query   Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 9
  10. 10. BackgroundData Stream Management Systems (DSMS) •  Research Prototypes –  Amazon/Cougar (Cornell) – sensors –  Aurora (Brown/MIT) – sensor monitoring, dataflow –  Gigascope: AT&T Labs – Network Monitoring –  Hancock (AT&T) – Telecom streams –  Niagara (OGI/Wisconsin) – Internet DBs & XML –  OpenCQ (Georgia) – triggers, view maintenance –  Stream (Stanford) – general-purpose DSMS –  Stream Mill (UCLA) - power & extensibility –  Tapestry (Xerox) – publish/subscribe filtering –  Telegraph (Berkeley) – adaptive engine for sensors –  Tribeca (Bellcore) – network monitoring •  High-tech startups –  Streambase, Coral8, Apama, Truviso •  Major DBMS vendors are all adding stream extensions as well –  Oracle http://www.oracle.com/technology/products/dataint/htdocs/streams_fo.html –  DB2 http://www.eweek.com/c/a/Database/IBM-DB2-Turns-25-and-Prepares-for-New-Life/ Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 10
  11. 11. BackgroundCan the Semantic Web process data stream? •  The Semantic Web, the Web of Data is doing fine –  RDF, RDF Schema, SPARQL, OWL, RIF –  well understood theory, –  rapid increase in scalability •  BUT it pretends that the world is static or at best a low change rate both in change-volume and change-frequency –  ontology versioning –  belief revision –  time stamps on named graphs •  It sticks to the traditional one-time semantics Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 11
  12. 12. ConceptStream Reasoning [IEEE-IS2010] •  Idea origination –  Can continuous semantics be ported to reasoning? –  This is an unexplored yet high impact research area! •  Stream Reasoning –  Logical reasoning in real time on gigantic and inevitably noisy data streams in order to support the decision process of extremely large numbers of concurrent users. -- S. Ceri, E. Della Valle, F. van Harmelen and H. Stuckenschmidt, 2010 •  Note: making sense of streams necessarily requires processing them against rich background knowledge Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 12
  13. 13. ConceptResearch Challenges •  Relation with data-stream systems –  Just as RDF relates to data-base systems? •  Query languages for semantic streams –  Just as SPARQL for RDF but with continuous semantics? •  Reasoning on Streams –  Formal representations for stream reasoning –  Notions of soundness and completeness –  Efficiency –  Scalability •  Dealing with incomplete & noisy data –  Even more so than on the current Web of Data •  Distributed and parallel processing –  Streams are parallel in nature Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 13
  14. 14. AchievementsExplored Continuous Semantics for SeWeb •  We investigated –  Architecture of a Stream Reasoner –  RDF streams •  the natural extension of the RDF data model to the new continuous scenario and –  Continuous SPARQL (or simply C-SPARQL) •  the extension of SPARQL for querying RDF streams. –  Efficient incremental updates of deductive closures •  specifically considering the nature of data streams –  Effective inductive stream reasoning (joint work with Siemens - Munich) •  See paper in IEEE IS special issue on Social Media Analytics Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 14
  15. 15. AchievementsArchitecture (IEEE-IS2010) Social  Media  Analytics Selector Abstracter Deductive C Window DSMS  . DSMS Reasoner C C Abstracter Inductive Legend Long-­‐Term P data  stream C C-­‐SPARQL  query Matrix Reasoner RDF  stream P SPARQL  with Probability Abstracter Inductive RDF  graph Hype P Matrix Reasoner •  Based on the LarKC conceptual framework http://www.larkc.eu Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 15
  16. 16. AchievementsRDF Stream [WWW2009,EDBT2010,IJSC2010] •  RDF Stream Data Type –  Ordered sequence of pairs, where each pair is made of an RDF triple and its timestamp t (< triple >, t) •  E.g., (<:Giulia :likes :Twilight >, 2010-02-12T13:34:41) (<:John :likes :TheLordOfTheRings >, 2010-02-12T13:36:28) (<:Alice :dislikes :Twilight >, 2010-02-12T13:36:28) Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 16
  17. 17. AchievementsC-SPARQL [WWW2009,EDBT2010,IJSC2010] •  We specificied of C-SPARQL syntax –  Incrementally, from existing specifications •  Including windows, grouping, aggregates, timestamping •  We gave the formal semantics of C-SPARQL –  Query registration, handling overloads –  Order of evaluation, pattern matching over time, … •  We investigated efficiency of evaluation –  Defining a suitable algebra –  Applying optimizations –  Efficient materialization of inferred data from streams Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 17
  18. 18. AchievementsAn Example of C-SPARQL Query Who are the opinion makers? i.e., the users who are likely to influence the behavior of other users who follow them REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource } FROM STREAM <http://streamingsocialdata.org/interactions> [RANGE 30m STEP 5m] WHERE { ?opinionMaker ?opinion ?resource . ?follower sioc:follows ?opinionMaker. ?follower ?opinion ?resource. FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker) && ?opinion != sd:accesses ) } HAVING ( COUNT(DISTINCT ?follower) > 3 ) Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 18
  19. 19. AchievementsAn Example of C-SPARQL Query Who are the opinion makers? i.e., the users who are likely to influence Query registration RDF Stream added as the (for continuous execution) who follow them behavior of other users new ouput format REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource } FROM STREAM <http://streamingsocialdata.org/interactions> [RANGE 30m STEP 5m] FROM STREAM clause WHERE { ?opinionMaker ?opinion ?resource . WINDOW ?follower sioc:follows ?opinionMaker. Builtin to ?follower ?opinion ?resource. access timestamps FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker) && ?opinion != sd:accesses ) Aggregates as in SPARQL 1.1 } HAVING ( COUNT(DISTINCT ?follower) > 3 ) Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 19
  20. 20. AchievementsEfficiency of Evaluation 1/3 [IEEE-IS2010] •  Evaluation of Window-based Selection Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 20
  21. 21. AchievementsEfficiency of Evaluation 2/3 [EDBT2010] •  Several transformations can be applied to algebraic representation of C-SPARQL •  some recalling well known results from classical relational optimization –  push of FILTERs and projections •  some being more specific to the domain of streams. –  push of aggregates. Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 21
  22. 22. AchievementsEfficiency of Evaluation 3/3 [EDBT2010] •  Push of filters and projections 125 100 75 ms 50 25 0 10 100 1000 10000 100000 Window Size None Static Only Streaming Only Both Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 22
  23. 23. AchievementsExample of C-SPARQL and Reasoning 1/2 What impact have I been creating with my tweets in the last hour? Is it positive or negative? Let’s count them … REGISTER QUERY CountPositiveAndNegativeReactions AS PREFIX : <http://ex.org/twitterImpactMining#> SELECT ?t count(?pos) count(?neg) FROM STREAM <http://ex.org/discussions.trdf> [RANGE 30m STEP 30s] :discuss a owl:TransitiveProperty . WHERE { :reply rdfs:subPropertyOf :discuss . ?t a :MonitoredTweet . :retweet rdfs:subPropertyOf :discuss . { ?pos :discuss ?t ; :ProduceReaction [ a :PositiveReaction ] . } UNION { ?neg :discuss ?t ; :ProduceReaction [ a :NegativeReaction ] . } } GROUP BY ?t Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 23
  24. 24. AchievementsExample of C-SPARQL and Reasoning 2/2 discuss   discuss   retweet   reply   retweet   t1   t1-­‐1   t1-­‐2   t1-­‐3   discuss   discuss   discuss   discuss   Monitored                        Posi.ve                            Nega.ve   Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 24
  25. 25. AchievementsState-of-the-Art Approach [Ceri1994,Volz2005] 1.  Overestimation of deletion: Overestimates deletions by computing all direct consequences of a deletion. 2.  Rederivation: Prunes those estimated deletions for which alternative derivations (via some other facts in the program) exist. 3.  Insertion: Adds the new derivations that are consequences of insertions to extensional predicates. Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 25
  26. 26. Achievementsour approach [ESWC2010] 1/2 •  Assuption –  Insertions and deletions are triples respectively entering and exiting the window –  The window size is known •  Therefore –  The time when each triple will expire is known and determined by the window size •  E.g. if the window is 10s long a triple entering at time t will exit at time t+10s –  Note: all knowledge can be annotated with an expiration time •  i.e., background knowledge is annotated with +∞ Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 26
  27. 27. Achievementsour approach [ESWC2010] 2/2 •  The algorithm 1.  deletes all triples (asserted or inferred) that have just expired 2.  computes the entailments derived by the inserts, 3.  annotates each entailed triple with a expiration time, and 4.  eliminates from the current state all copies of derived triples except the one with the highest timestamp. •  learn more –  http://www.slideshare.net/emanueledellavalle/incremental- reasoning-on-streams-andrich-background-knowledge Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 27
  28. 28. AchievementsComparative Evaluation 1/2 [ESWC2010] •  Hypothesis –  Background knowledge do not change and it is fully materialized –  Changes only take place in the window •  An experiment comparing the time required to compute a new materialization using –  Re-computing from scratch (i.e.,1250 ms in our setting) –  State of the art incremental approach [Volz, 2005] –  Our approach •  Results at increasing % of the materialization changed when the window slides 10000 1000 ms. 100 10 0,0% 2,0% 4,0% 6,0% 8,0% 10,0% 12,0% 14,0% 16,0% 18,0% 20,0% •  . %  of  t he  m aterialization   changed  when  t he  window  slides incremental-­‐volz incremental-­‐stream Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 28
  29. 29. AchievementsComparative Evaluation 2/2 •  Comparison of the average time needed to answer a C-SPARQL query using –  a forward reasoner, –  the naive approach of re-computing the materialization –  our approach 20 15 10 ms. 5 0 forward  reasoning naive  approach incremental-­‐stream query 5,82 1,61 1,61 materialization 0 15,91 0,28 Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 29
  30. 30. Retrospective and ConclusionsWrap Up •  RDF Streams –  Notion defined •  C-SPARQL –  Syntax and semantics defined as a SPARQL extension –  Engine designed –  Engine implemented based on the decision to keep stream management and query evaluation separated •  Experiments with C-SPARQL under simple RDF entailment regimes –  window based selection of C-SPARQL outperforms the standard FILTER based selection –  having formally defined C-SPARQL semantics algebraic optimizations are possible •  Experiment with C-SPARQL under OWL-RL entailment regimes –  efficient incremental updates of deductive closures investigated –  our approach outperform state-of-the-art when updates comes as stream Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 30
  31. 31. Retrospective and ConclusionsAchievements vs. Research Challenges •  Relation with data-stream systems –  Notion of RDF stream :-| •  Query languages for semantic streams –  C-SPARQL :-D •  Reasoning on Streams –  Formal representations for stream reasoning •  :-P –  Notions of soundness and completeness •  :-P –  Efficient incremental updates of deductive closures •  ESWC 2010 paper :-) ... but much more work is needed! –  How to combine streams and background knowledge •  ESWC 2010 paper :-| ... but a lot needs to be studied ... •  Dealing with incomplete & noisy data –  :-P •  Distributed and parallel processing –  :-P Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 31
  32. 32. References •  Vision [IEEE-IS2009] Emanuele Della Valle, Stefano Ceri, Frank van Harmelen, Dieter Fensel Its a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems 24(6): 83-89 (2009) •  Continuous SPARQL (C-SPARQL) [EDBT2010] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri and Michael Grossniklaus. An Execution Environment for C-SPARQL Queries. EDBT 2010 [WWW2009] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus: C-SPARQL: SPARQL for continuous querying. WWW 2009: 1061-1062 [IJSC2010] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus: C-SPARQL: a Continuous Query Language for RDF Data Streams. Int. J. Semantic Computing 4(1): 3-25 (2010) [IEEE-IS2010] Davide Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Yi Huang, Volker Tresp, Achim Rettinger, Hendrik Wermser, "Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics," IEEE Intelligent Systems, 30 Aug. 2010. •  Stream Reasoning [ESWC2010] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus. Incremental Reasoning on Streams and Rich Background Knowledge. In. 7th Extended Semantic Web Conference (ESWC 2010) •  Background work [Ceri1994] Stefano Ceri, Jennifer Widom: Deriving Incremental Production Rules for Deductive Data. Inf. Syst. 19(6): 467-490 (1994) [Volz2005] Raphael Volz, Steffen Staab, Boris Motik: Incrementally Maintaining Materializations of Ontologies Stored in Logic Databases. J. Data Semantics 2: 1-34 (2005) Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org 32
  33. 33. Thank You! Questions? Much More to Come! Keep an eye on http://www.streamreasoning.org Oxford, 2011-1-18 For more information visit http://www.larkc.eu/ 33
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×