Stream Reasoning: Where we got so far. Oxford 2010.1.18
Upcoming SlideShare
Loading in...5
×
 

Stream Reasoning: Where we got so far. Oxford 2010.1.18

on

  • 944 views

 

Statistics

Views

Total Views
944
Views on SlideShare
944
Embed Views
0

Actions

Likes
0
Downloads
10
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Service-Finder

Stream Reasoning: Where we got so far. Oxford 2010.1.18 Stream Reasoning: Where we got so far. Oxford 2010.1.18 Presentation Transcript

  • Stream Reasoning Where We Got So Far Oxford - 2010.1.18 http://streamreasoning.org Emanuele Della Valle DEI - Politecnico di Milano [email_address] http://emanueledellavalle.org Joint work with: Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, and Michael Grossniklaus
  • Agenda
    • Motivation
    • Running Example
    • Background
    • Concept
    • Achievements
    • Retrospective and Conclusions
    Oxford, 2011-1-18
  • Motivation It ‘s a streaming World! [IEEE-IS2009]
    • Sensor networks, …
    • traffic engineering, …
    • social networking, …
    • financial markets, …
    • generate streams!
    Oxford, 2011-1-18
  • Running Example Real-Time Streams on the Web
    • Streams are appearing more and more often on the Web in sites that distribute and present information in real-time streams.
    • Checkout http://activitystrea.ms/ for a standard API
    • E.g.
    Oxford, 2011-1-18 Emanuele Della Valle - visit http://streamreasoning.org
  • Running Example Examples of Questions Users are Asking
    • Which topics have my close friends discussed in the last hour ?
    • Which book is my friend likely to read next ?
    • What impact have I been creating with my tweets in the last day ?
    • <query> … <time dimension> ?
    Oxford, 2011-1-18
  • Motivation Problem Statement
    • Making sense
      • in real time
      • of gigantic and inevitably noisy data streams
      • in order to support the decision process of extremely large numbers of concurrent user
    Oxford, 2011-1-18
  • Background What are data streams anyway?
    • Formally:
      • Data streams are unbounded sequences of time-varying data elements
    • Less formally:
      • an (almost) “continuous” flow of information
      • with the recent information being more relevant as it describes the current state of a dynamic system
    time Oxford, 2011-1-18
  • Background Continuous Semantics
    • Processing data streams in the space of one-time semantics is difficult because of the very nature of the underlying data
    • Innovative * assumption: continuous semantics!
      • streams can be consumed on the fly rather than being stored forever and
      • queries are registered and continuously produce answers
    • * This innovation arose in DB community in ’90s
    Oxford, 2011-1-18
  • Background Stream Processing
    • Continuous queries registered over streams that are observed trough windows
    Oxford, 2011-1-18 window input stream stream of answer Registered Continuous Query
  • Background Data Stream Management Systems (DSMS)
    • Research Prototypes
      • Amazon/Cougar (Cornell) – sensors
      • Aurora (Brown/MIT) – sensor monitoring, dataflow
      • Gigascope: AT&T Labs – Network Monitoring
      • Hancock (AT&T) – Telecom streams
      • Niagara (OGI/Wisconsin) – Internet DBs & XML
      • OpenCQ (Georgia) – triggers, view maintenance
      • Stream (Stanford) – general-purpose DSMS
      • Stream Mill (UCLA) - power & extensibility
      • Tapestry (Xerox) – publish/subscribe filtering
      • Telegraph (Berkeley) – adaptive engine for sensors
      • Tribeca (Bellcore) – network monitoring
    • High-tech startups
      • Streambase, Coral8, Apama, Truviso
    • Major DBMS vendors are all adding stream extensions as well
      • Oracle http://www.oracle.com/technology/products/dataint/htdocs/streams_fo.html
      • DB2 http://www.eweek.com/c/a/Database/IBM-DB2-Turns-25-and-Prepares-for-New-Life/
    Oxford, 2011-1-18
  • Background Can the Semantic Web process data stream?
    • The Semantic Web, the Web of Data is doing fine
      • RDF, RDF Schema, SPARQL, OWL, RIF
      • well understood theory,
      • rapid increase in scalability
    • BUT it pretends that the world is static or at best a low change rate both in change-volume and change-frequency
      • ontology versioning
      • belief revision
      • time stamps on named graphs
    • It sticks to the traditional one-time semantics
    Oxford, 2011-1-18
  • Concept Stream Reasoning [IEEE-IS2010 ]
    • Idea origination
      • Can continuous semantics be ported to reasoning?
      • This is an unexplored yet high impact research area!
    • Stream Reasoning
      • Logical reasoning in real time on gigantic and inevitably noisy data streams in order to support the decision process of extremely large numbers of concurrent users.
      • -- S. Ceri , E. Della Valle , F. van Harmelen and H. Stuckenschmidt , 2010
    • Note: making sense of streams necessarily requires processing them against rich background knowledge
    Oxford, 2011-1-18
  • Concept Research Challenges
    • Relation with data-stream systems
      • Just as RDF relates to data-base systems?
    • Query languages for semantic streams
      • Just as SPARQL for RDF but with continuous semantics?
    • Reasoning on Streams
      • Formal representations for stream reasoning
      • Notions of soundness and completeness
      • Efficiency
      • Scalability
    • Dealing with incomplete & noisy data
      • Even more so than on the current Web of Data
    • Distributed and parallel processing
      • Streams are parallel in nature
    Oxford, 2011-1-18
  • Achievements Explored Continuous Semantics for SeWeb
    • We investigated
      • Architecture of a Stream Reasoner
      • RDF streams
        • the natural extension of the RDF data model to the new continuous scenario and
      • Continuous SPARQL (or simply C-SPARQL )
        • the extension of SPARQL for querying RDF streams.
      • Efficient incremental updates of deductive closures
        • specifically considering the nature of data streams
      • Effective inductive stream reasoning (joint work with Siemens - Munich)
        • See paper in IEEE IS special issue on Social Media Analytics
    Oxford, 2011-1-18
  • Achievements Architecture (IEEE-IS2010)
    • Based on the LarKC conceptual framework
    • http://www.larkc.eu
    Oxford, 2011-1-18
  • Achievements RDF Stream [WWW2009,EDBT2010,IJSC2010]
    • RDF Stream Data Type
      • Ordered sequence of pairs, where each pair is made of an RDF triple and its timestamp t
        • (< triple >, t)
    • E.g.,
      • (<:Giulia :likes :Twilight >, 2010-02-12T13:34:41)
      • (<:John :likes :TheLordOfTheRings >, 2010-02-12T13:36:28)
      • (<:Alice :dislikes :Twilight >, 2010-02-12T13:36:28)
    Oxford, 2011-1-18
  • Achievements C-SPARQL [WWW2009,EDBT2010,IJSC2010]
    • We specificied of C-SPARQL syntax
      • Incrementally, from existing specifications
        • Including windows, grouping, aggregates, timestamping
    • We gave the formal semantics of C-SPARQL
      • Query registration, handling overloads
      • Order of evaluation, pattern matching over time, …
    • We investigated efficiency of evaluation
      • Defining a suitable algebra
      • Applying optimizations
      • Efficient materialization of inferred data from streams
    Oxford, 2011-1-18
  • Achievements An Example of C-SPARQL Query
    • Who are the opinion makers? i.e., the users who are likely to influence the behavior of other users who follow them
    • REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS
    • CONSTRUCT { ?opinionMaker sd:about ?resource }
    • FROM STREAM <http://streamingsocialdata.org/interactions> [RANGE 30m STEP 5m]
    • WHERE {
    • ?opinionMaker ?opinion ?resource .
    • ?follower sioc:follows ?opinionMaker.
    • ?follower ?opinion ?resource.
    • FILTER ( cs:timestamp (?follower) > cs:timestamp (?opinionMaker)
    • && ?opinion != sd:accesses )
    • }
    • HAVING ( COUNT(DISTINCT ?follower) > 3 )
    Oxford, 2011-1-18
  • Achievements An Example of C-SPARQL Query
    • Who are the opinion makers? i.e., the users who are likely to influence the behavior of other users who follow them
    • REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS
    • CONSTRUCT { ?opinionMaker sd:about ?resource }
    • FROM STREAM <http://streamingsocialdata.org/interactions> [RANGE 30m STEP 5m]
    • WHERE {
    • ?opinionMaker ?opinion ?resource .
    • ?follower sioc:follows ?opinionMaker.
    • ?follower ?opinion ?resource.
    • FILTER ( cs:timestamp (?follower) > cs:timestamp (?opinionMaker)
    • && ?opinion != sd:accesses )
    • }
    • HAVING ( COUNT(DISTINCT ?follower) > 3 )
    Oxford, 2011-1-18 Query registration (for continuous execution) FROM STREAM clause WINDOW RDF Stream added as new ouput format Builtin to access timestamps Aggregates as in SPARQL 1.1
  • Achievements Efficiency of Evaluation 1/3 [IEEE-IS2010]
    • Evaluation of Window-based Selection
    Oxford, 2011-1-18
  • Achievements Efficiency of Evaluation 2/3 [EDBT2010]
    • Several transformations can be applied to algebraic representation of C-SPARQL
    • some recalling well known results from classical relational optimization
      • push of FILTERs and projections
    • some being more specific to the domain of streams.
      • push of aggregates.
    Oxford, 2011-1-18
  • Achievements Efficiency of Evaluation 3/3 [EDBT2010]
    • Push of filters and projections
    Oxford, 2011-1-18
  • Achievements Example of C-SPARQL and Reasoning 1/2
    • What impact have I been creating with my tweets in the last hour? Is it positive or negative? Let ’ s count them …
    • REGISTER QUERY CountPositiveAndNegativeReactions AS
    • PREFIX : <http://ex.org/twitterImpactMining#>
    • SELECT ?t count(?pos) count(?neg)
    • FROM STREAM <http://ex.org/discussions.trdf>
    • [RANGE 30m STEP 30s]
    • WHERE {
    • ?t a :MonitoredTweet .
    • { ?pos :discuss ?t ;
    • :ProduceReaction [ a :PositiveReaction ] .
    • } UNION {
    • ?neg :discuss ?t ;
    • :ProduceReaction [ a :NegativeReaction ] .
    • }
    • } GROUP BY ?t
    Oxford, 2011-1-18 :discuss a owl:TransitiveProperty . :reply rdfs:subPropertyOf :discuss . :retweet rdfs:subPropertyOf :discuss .
  • Achievements Example of C-SPARQL and Reasoning 2/2 Oxford, 2011-1-18 t 1 t 1-1 t 1-2 t 1-3 retweet reply retweet discuss discuss discuss discuss discuss discuss Monitored Positive Negative
  • Achievements State-of-the-Art Approach [Ceri1994,Volz2005]
    • Overestimation of deletion : Overestimates deletions by computing all direct consequences of a deletion.
    • Rederivation : Prunes those estimated deletions for which alternative derivations (via some other facts in the program) exist.
    • Insertion : Adds the new derivations that are consequences of insertions to extensional predicates.
    Oxford, 2011-1-18
  • Achievements our approach [ESWC2010] 1/2
    • Assuption
      • Insertions and deletions are triples respectively entering and exiting the window
      • The window size is known
    • Therefore
      • The time when each triple will expire is known and determined by the window size
        • E.g. if the window is 10s long a triple entering at time t will exit at time t+10s
      • Note: all knowledge can be annotated with an expiration time
        • i.e., background knowledge is annotated with + 
    Oxford, 2011-1-18
  • Achievements our approach [ESWC2010] 2/2
    • The algorithm
      • deletes all triples (asserted or inferred) that have just expired
      • computes the entailments derived by the inserts,
      • annotates each entailed triple with a expiration time, and
      • eliminates from the current state all copies of derived triples except the one with the highest timestamp.
    • learn more
      • http://www.slideshare.net/emanueledellavalle/incremental-reasoning-on-streams-andrich-background-knowledge
    Oxford, 2011-1-18
  • Achievements Comparative Evaluation 1/2 [ESWC2010]
    • Hypothesis
      • Background knowledge do not change and it is fully materialized
      • Changes only take place in the window
    • An experiment comparing the time required to compute a new materialization using
      • Re-computing from scratch (i.e.,1250 ms in our setting)
      • State of the art incremental approach [Volz, 2005]
      • Our approach
    • Results at increasing % of the materialization changed when the window slides
    • .
    Oxford, 2011-1-18
  • Achievements Comparative Evaluation 2/2
    • Comparison of the average time needed to answer a C-SPARQL query using
      • a forward reasoner,
      • the naive approach of re-computing the materialization
      • our approach
    Oxford, 2011-1-18
  • Retrospective and Conclusions Wrap Up
    • RDF Streams
      • Notion defined
    • C-SPARQL
      • Syntax and semantics defined as a SPARQL extension
      • Engine designed
      • Engine implemented based on the decision to keep stream management and query evaluation separated
    • Experiments with C-SPARQL under simple RDF entailment regimes
      • window based selection of C-SPARQL outperforms the standard FILTER based selection
      • having formally defined C-SPARQL semantics algebraic optimizations are possible
    • Experiment with C-SPARQL under OWL-RL entailment regimes
      • efficient incremental updates of deductive closures investigated
      • our approach outperform state-of-the-art when updates comes as stream
    Oxford, 2011-1-18
  • Retrospective and Conclusions Achievements vs. Research Challenges
    • Relation with data-stream systems
      • Notion of RDF stream :-|
    • Query languages for semantic streams
      • C-SPARQL :-D
    • Reasoning on Streams
      • Formal representations for stream reasoning
        • :-P
      • Notions of soundness and completeness
        • :-P
      • Efficient incremental updates of deductive closures
        • ESWC 2010 paper :-) ... but much more work is needed!
      • How to combine streams and background knowledge
        • ESWC 2010 paper :-| ... but a lot needs to be studied ...
    • Dealing with incomplete & noisy data
      • :-P
    • Distributed and parallel processing
      • :-P
    Oxford, 2011-1-18
  • References
    • Vision
      • [IEEE-IS2009] Emanuele Della Valle, Stefano Ceri, Frank van Harmelen, Dieter Fensel It's a Streaming World! Reasoning upon Rapidly Changing Information . IEEE Intelligent Systems 24(6): 83-89 (2009)
    • Continuous SPARQL (C-SPARQL)
      • [EDBT2010] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri and Michael Grossniklaus. An Execution Environment for C-SPARQL Queries . EDBT 2010
      • [WWW2009] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus: C-SPARQL: SPARQL for continuous querying . WWW 2009: 1061-1062
      • [IJSC2010] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus: C-SPARQL: a Continuous Query Language for RDF Data Streams. Int. J. Semantic Computing 4(1): 3-25 (2010)
      • [IEEE-IS2010] Davide Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Yi Huang, Volker Tresp, Achim Rettinger, Hendrik Wermser, &quot;Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics,&quot; IEEE Intelligent Systems, 30 Aug. 2010.
    • Stream Reasoning
      • [ESWC2010] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, Michael Grossniklaus. Incremental Reasoning on Streams and Rich Background Knowledge. In. 7th Extended Semantic Web Conference (ESWC 2010)
    • Background work
      • [Ceri1994] Stefano Ceri, Jennifer Widom: Deriving Incremental Production Rules for Deductive Data. Inf. Syst. 19(6): 467-490 (1994)
      • [Volz2005] Raphael Volz, Steffen Staab, Boris Motik: Incrementally Maintaining Materializations of Ontologies Stored in Logic Databases. J. Data Semantics 2: 1-34 (2005)
    Oxford, 2011-1-18
  • Thank You! Questions? Much More to Come! Keep an eye on http://www.streamreasoning.org Oxford, 2011-1-18