Order Matters! Harnessing a World of Orderings for Reasoning over Massive Data

  • 918 views
Uploaded on

More and more applications require real-time processing of massive, dynamically generated, ordered data; order is an essential factor as it reflects recency or relevance. Semantic technologies risk …

More and more applications require real-time processing of massive, dynamically generated, ordered data; order is an essential factor as it reflects recency or relevance. Semantic technologies risk being unable to meet the needs of such applications, as they are not equipped with the appropriate instruments for answering queries over massive, highly dynamic, ordered data sets. This talk argues that some order-aware data management techniques should be exported to the context of semantic technologies, by integrating ordering with reasoning, and by using methods which are inspired by stream and rank-aware data management. This talk systematically explores the problem space, and points both to problems which have been successfully approached and to problems which still need fundamental research, in an attempt to stimulate and guide a paradigm shift in semantic technologies.

More in: Lifestyle
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
918
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
20
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. http://streamreasoning.orgOrder Matters!Harnessing a World of Orderingsfor Reasoning over Massive DataEmanuele Della Valleemanuele.dellavalle@polimi.it - http://emanueledellavalle.org
  • 2. Acknowledges§  This talk presents the content of a joint paper with Stefan Schlobachb, Markus Krötzschc, Alessandro Bozzona, Stefano Ceria, and Ian Horrocksc to appear on SWJ a Politecnico di Milano b Vrije Universiteit Amsterdam c Univerity of Oxford§  I also want to thank Frank van Harmelenb for his important contribution to the discussion, Tony Lee (Saltlux), Andreas Schreiber (DLR) and Achim Basermann (DLR) for the valuable discussion on concrete examples of problems that require order- aware reasoning. Moreover I want to thank Sara Magliacaneb for her work on SPARQL-RANK and the slides I use in this presentation, and Marco Balduinia, Davide Barbieria, and Daniele Bragaa for their work on C-SPARQL§  Check out the paper: •  http://www.semantic-web-journal.net/content/order-matters- harnessing-world-orderings-reasoning-over-massive-data Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 3. References§  The numbers in square brackets refers to references in the SWJ paper •  http://www.semantic-web-journal.net/content/order- matters-harnessing-world-orderings-reasoning-over- massive-data§  A short selection of references to my papers is available in the end of the presentation. Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 4. The problem, three use cases, and …§  More and more applications require real-time processing of massive, dynamically generated, data Space Situational Jet Engine Intelligent Awareness Design Surveillance Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 5. The ProblemUse case: space junk[source http://wordlesstech.com/2011/03/26/space-junk/ ] Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 5
  • 6. The ProblemUse case: jet engine design[Source: http://www.sae.org/mags/aem/10018/ ] Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 6
  • 7. The Problem Use case: intelligent surveillance[Source: http://youtu.be/I3iDBfB_ZC0 ] Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 7
  • 8. The Problem… and four common features!§  their data is ordered, •  naturally ordered by recency, proximity, etc. •  intrinsically ordered by precision, popularity, provenance, certainty, trust, etc. •  and, in any case, it is explicitly sortable through attribute values§  the answers are also required to come in an ordered fashion •  engineers surveying a satellite orbit need to know the largest pieces of debris in closest proximity with maximal certainty, measured with highest precision, etc.§  they require immediate answers at runtime •  flight paths have to be adapted once an object in collision course is detected§  and, they require inference •  rich ontological models describing complex domain knowledge is often used to pose the queries and to interpret the results Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 9. The ProblemPerformance targets Answer Targetquality at time t Fully correct answers Desired situation Current situation Computation Time t Real-time Max runtime behaviourNote: completeness may not be necessary if all relevant answers are found Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 9
  • 10. The ProblemA running example§  Imagine a system which •  listens to all micro-posts that are published, •  knows the geographic location of social media users, •  has the ability of detecting the topic of each micro- post, and •  has modelled relationships between topics in an expressive ontological language§  Let suppose that each of us asks a query like the following to such a system: •  Which users of social media, currently leading popular discussions on fashion-related topics, are closest to my current location? What are they saying about the shopping district nearby? Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 11. The solution space Types of orders Combinations Expensive to enforce Cheap to enforce Natural No ordering Types of Approximation reasoning and parallelisation No reasoning Data-driven Query-driven CombinationsTrento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 11
  • 12. The solution spaceno ordering, no reasoning Types of orders Combinations Expensive to enforce Cheap to enforce Natural No ordering Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 12
  • 13. The solution spaceno ordering, no reasoning§  Most of the big data solutions currently on the market •  BSP (Bulk Synchronous Parallel) •  PRAM (Parallel Random Access Machine) •  PGAS (Partitioned Global Access Space) •  Map-Reduce implementations •  and data-centric workflow systems based on them§  Some (e.g., Hive and Pig) allow the specification of ordering constraints, but no specific optimisation is provided for top-k or streaming queries§  W.r.t. the running example •  Right performances and scalability •  Limited ability to harnessing orderings •  Missing inference capability Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 14. The solution spaceOrder aware data management Types of orders Combinations data management Expensive to enforce Order-aware Cheap to enforce Natural No ordering Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 14
  • 15. The solution spaceOrder aware data management§  When treating massive data order matters! Data  as  a   where  we  can   e.g.,  order  by   sortable  en,ty   enforce  orderings   •  sortable  literals   easily  and  logically   •  popularity   •  uncertainty   •  trust   Most  relevant   streaming     answers  first     algorithms  §  If N is the size of the input, a problem is considered to be “well- solved” if a streaming algorithm exists which requires at most O(poly(log(N)) space and time [31] Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 16. The solution spaceOrder aware data management and approximation§  approximate, streaming algorithms can outperform classical, data-bound approaches to this problem by several orders of magnitude [6,14].§  Such approximations can be asymptotic, so that arbitrary accuracy can be achieved [6]. Answer accuracy at Fully correct answers computation time t Computation Time t Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 17. The solution spaceHarnessing natural orderings Types of orders Combinations Expensive to enforce Cheap to enforce Natural No ordering Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 17
  • 18. The solution spaceHarnessing natural orderings§  Continuous queries registered over streams that, in most of the cases, are observed trough windows window input streams Registered   streams of answer (unbound, and Con,nuous   time-varying) Query  §  Assumption: the recent information being more relevant as it describes the current state of a dynamic system Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 18
  • 19. The solution spaceHarnessing natural orderings§  The nature of streams requires a paradigmatic change* •  from persistent data –  to be stored and queried on demand –  a.k.a. one time semantics •  to transient data –  to be consumed on the fly by continuous queries –  a.k.a. continuous semantics* This paradigmatic change first arose in DB community [31] Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 20. The solution spaceHarnessing natural orderings§  Two types of solutions •  Data Stream Management Systems (DSMS) •  Complex Event Processors (CEP)§  Research Prototypes •  Amazon/Cougar (Cornell) – sensors •  Aurora (Brown/MIT) – sensor monitoring, dataflow •  Gigascope: AT&T Labs – Network Monitoring •  Hancock (AT&T) – Telecom streams •  Niagara (OGI/Wisconsin) – Internet DBs & XML •  OpenCQ (Georgia) – triggers, view maintenance •  Stream (Stanford) – general-purpose DSMS •  Stream Mill (UCLA) - power & extensibility •  Tapestry (Xerox) – publish/subscribe filtering •  Telegraph (Berkeley) – adaptive engine for sensors •  Tribeca (Bellcore) – network monitoring§  High-tech startups •  Streambase, Coral8, Apama, Truviso§  Major DBMS vendors are all adding stream extensions as well •  IBM InfoSphere Stream •  Microsoft streaminsight •  Oracle CEP Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 21. The solution spaceHarnessing natural orderings§  DSMSs are optimised for the simplest portion of the query in our running example •  retrieve the micro posts that have been posted recently Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 22. The solution spaceHarnessing other types of orders Types of orders Combinations Expensive to enforce Cheap to enforce Natural No ordering Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 22
  • 23. The solution spaceHarnessing other types of orders§  W.r.t. the running example, solutions studied in these two areas allow to efficiently •  retrieve nearby shops that are discussed by popular social media users.§  This is a typical top-k query •  a limited number of results k •  ordered by a scoring function •  that combines several criteria –  e.g., near by and most discussed Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 24. The solution space - Harnessing other types of ordersTreating order as a first class citizen§  Traditional query §  Order-aware query evaluation schema: evaluation schema: materialize then sort split and interleave Limit  to  K   Limit  to  K   [10s]   [10s]   Materialize  join  results  and  order   them  all  by  proximity  of  the  shop   discussed   to  the  issuer  and  popularity  of  the   [10s]   [10s]   social  media  user       [1,000s]   Order  by   Order  by   proximity  to   popularity     discussed   the  issuer   [1,000s]   [100,0000s]   shops   social   shops   social   media  user   media  user   Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 24
  • 25. The solution space - Harnessing other types of ordersThe split-and-interleave scheme§  State-of-the-art •  Literature in RDBMS (for a survey see [35]) presents the split-and-interleave scheme: 1.  Split the evaluation of the scoring function into the evaluation of the single criteria 2.  Interleave them with other operators 3.  Use partial orders to construct incrementally the final order§  Standard assumptions: •  Monotone increasing scoring function •  Sorted access for each criterion •  Random access when possible is expensive •  No uncertainty in the scores •  No uncertainty in the scoring function Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 26. The solution space - Harnessing other types of ordersBe aware, it’s a trade-off Orders ofmagnitude NOTE: Typically users are interested in 1<= k <= 100 Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 26
  • 27. The solution spaceHarnessing all types of orders together Types of orders Combinations Expensive to enforce Cheap to enforce Natural No ordering Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 27
  • 28. The solution spaceHarnessing all types of orders together§  W.r.t. the running example, solutions studied in these area allow to efficiently •  retrieve the shops nearby that popular social media users are currently positively posting about..§  This is a typical continuous monitoring of top-k queries over sliding windows [45]§  A very promising and little explored research area in data management Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 29. The solution spaceWrapping up order-aware data mng.§  Two parts of the query in the running example remain difficult to express: •  knowing which topics are related to fashion –  requires at least a taxonomy of fashion-related topics •  computing which recent discussions on social media are popular –  requires to compute the transitive closure of the discussion§  Both are •  difficult to model without an expressive ontological language (such as OWL 2) and •  both require complex algorithms that an ontology reasoner can handle natively§  Moreover, order-aware data management techniques do not cope with heterogeneity •  i.e., data should be translated in one common representation before order-aware data manage- ment techniques can be applied. Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 30. The solution space Types of orders Combinations Expensive to enforce Cheap to enforce Natural No ordering Scalable reasoning Types of reasoning No reasoning Data-driven Query-driven CombinationsTrento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 30
  • 31. The Solution SpaceScalable Reasoning§  Why? •  handling heterogeneity in the input data through ontology-based information integration§  In the running example, •  ontological background knowledge can be used to model relationships between more specific and more general topics of interest, which can be used to infer which concrete topics are related to fashion§  How? •  Data-driven methods –  Scalable methods available in the state-of-the-art •  Query-driven methods –  research trend, implementations are appearing •  Combinations of the previous two –  mostly theoretical results Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 32. The Solution Space – Scalable ReasoningData-driven§  Ontological Language: •  OWL 2 RL –  aimed at applications that require scalable reasoning without sacrificing too much expressive power –  http://www.w3.org/TR/owl2-profiles/#OWL_2_RL§  Reasoning approach •  Backward chaining: from asserted data to all possible entailments§  Pros: Low query latency§  Cons: they do not take the actual information-need into account§  Implementations •  OWLIM, Virtuoso, Allegro- Graph, and OntoBroker§  Research trend •  Parallelization using Map-Reduce as a main paradigm –  e.g. [33,65] for OWL2RL or a fragment thereof [32,64,66,38] •  Applying similar techniques to more expressive fragments of OWL –  e.g., ELK reasoner for OWL EL [37] Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 33. The Solution Space – Scalable ReasoningQuery-driven§  Ontological Language •  OWL 2 QL –  designed for query answering in LOGSPACE w.r.t the size of the data, with the expressivity of conceptual models (e.g., UML class diagrams) –  http://www.w3.org/TR/owl2-profiles/#OWL_2_QL§  Reasoning approach •  Forward chaining: from query to asserted facts •  Query rewriting: from ontological query to a set of SQL queries§  Pros: limit the search space by considering the actual query§  Cons: number of rewritings grow exponentially§  Implementations •  QuOnto, Owlgres, and Requiem§  Research trend •  Extend query rewriting for more expressive ontology languages –  e.g., Datalog± [27,4] •  Parallelization using Map-Reduce –  e.g., Query Pie Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 34. The Solution Space – Scalable ReasoningCombinations§  Ontological Language •  Subject to research§  Reasoning approach •  combine the advantages of data- and query-driven approaches§  State-of-the-art •  Magic Sets technique [1]§  Recent theoretical results •  for limited fragment of OWL EL [44] •  for existential rules [4] Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 35. The Solution Space – Scalable ReasoningApproximation§  Many rule-based systems compute only part of the entailed consequences by employing a set of rules that cannot derive all results •  E.g., Jena, Sesame, OWLIM, and Virtuoso§  A typical approach is to approximate the input information by restricting to a simpler ontology language that is then processed with a more efficient, sound and complete algorithm •  e.g., Trowl [48], and screech [62].§  Approximate reasoning is used as a sub-method in many sound and complete reasoners, •  e.g., the OWL reasoner HermiT first computes the syntactically told class hierarchy before using more complex algorithms for a complete subsumption check.§  None of the above, however, deal with or take advantage of orderings of any kind.§  A number of interesting research challenges thus remain open. Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 36. The solution spaceWrap up of the talk so far Types of orders Combinations data management Expensive to enforce Order-aware Cheap to enforce Natural No ordering Scalable reasoning Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 36
  • 37. The solution spaceReasoning with streaming algorithms Types of orders Combinations Order-aware data management reasoning Expensive to enforce Order-aware Top-k Cheap to enforce Reasoning Natural Stream reasoning No ordering Scalable reasoning Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 37
  • 38. The solution spaceReasoning with streaming algorithms Types of orders Combinations data management Expensive to enforce Order-aware Cheap to enforce Natural Stream reasoning No ordering Scalable reasoning Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 38
  • 39. The solution spaceStream Reasoning [IEEE-IS2009]§  W.r.t. the running example, solutions studied in these area allow to efficiently •  compute which recent discussions on social media are popular§  For instance, how many micro-posts discussed (either replying or retweeting) my tweet? discuss   reply   discuss   reply   discuss   t2   reply   t4   t7   discuss   retweet   discuss   reply   discuss   reply   7! t1   t3   t5   t8   retweet   discuss   t6   Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 40. The solution spaceStream Reasoning features Trad Data Stream Automatic Stream Processing Processing Reasoning ReasoningFeature offers offers offers aims atProcessingStreamsHandling LargedatasetsReactivity (real-time)ExpressingFine-grainedqueriesCapturingKnowledgeAccess toPersistent Data Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 41. The solution spaceStream Reasoning definition§  Making sense [IEEE-IS2010] •  in real time •  of multiple, heterogeneous, gigantic and inevitably noisy data streams •  in order to support the decision process of extremely large numbers of concurrent user§  Note: making sense of streams necessarily requires processing them against rich background knowledge, an unsolved problem in database Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 42. The solution spaceArchitecture of a Stream Reasoner§  Continuous reasoning tasks registered over streams that, in most of the cases, are observed trough windows window Registered   input streams streams of answer Con,nuous   Reasoning   Tasks   Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 43. The solution spaceStream Reasoning PoliMi’s Achievements§  RDF Stream data type [WWW2009] •  (virtually) represent heterogeneous data streams§  C-SPARQL query language [WWW2009] •  express fine-grained continuous queries •  It is “compiled down” to keep high performances§  Incremental RDFS++ Reasoning [ESWC2010] •  allows for domain knowledge exploitation§  C-SPARQL Engine [EDBT2010] •  Fully operational prototype •  Deployed in award winning applications (e.g., Bottari [JWS2012]) Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 44. The solution spaceStream Reasoning PoliMi’s Achievements Types of orders Combinations data management Expensive to enforce Order-aware Cheap to enforce Natural No ordering Scalable reasoning Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 44
  • 45. The solution space – Stream Reasoning “alla PoliMi”RDF Stream§  RDF Stream Data Type •  Ordered sequence of pairs, where each pair is made of an RDF triple and its timestamp §  Timestamps are not required to be unique, they must be non- decreasing§  E.g., (<:Alice :posts :post1 >, 2010-02-12T13:34:41) (<:post1 :talksAboutPositively :LaScala>, 2010-02-12T13:34:41) (<:Bob :posts :post2 >, 2010-02-12T13:36:28) (<:post2 :talksAboutNegatively :Duomo>, 2010-02-12T13:36:28) Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 46. MEMO: SPARQLTrento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 47. The solution space – Stream Reasoning “alla PoliMi”Where C-SPARQL Extends SPARQL Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 48. The solution space – Stream Reasoning “alla PoliMi”An Example of C-SPARQL Query Who are the opinion makers? i.e., the users who are likely to influence the behavior of other users who follow them REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource } FROM STREAM <http://streamingsocialdata.org/interactions> [RANGE 30m STEP 5m] WHERE { ?opinionMaker ?opinion ?resource . ?follower sioc:follows ?opinionMaker. ?follower ?opinion ?resource. FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker) && ?opinion != sd:accesses ) } HAVING ( COUNT(DISTINCT ?follower) > 3 ) Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 49. The solution space – Stream Reasoning “alla PoliMi”An Example of C-SPARQL Query Who are the opinion makers? i.e., the users who are likely to influence the behavior of other users who follow added as Query registration RDF Stream them (for continuous execution) new ouput format REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource } FROM STREAM <http://streamingsocialdata.org/interactions> [RANGE 30m STEP 5m] WHERE { FROM STREAM clause ?opinionMaker ?opinion ?resource . WINDOW ?follower sioc:follows ?opinionMaker. ?follower ?opinion ?resource. Builtin to access FILTER ( cs:timestamp(?follower) > timestamps cs:timestamp(?opinionMaker) && ?opinion != sd:accesses ) Aggregates as } in SPARQL 1.1 HAVING ( COUNT(DISTINCT ?follower) > 3 ) Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 50. The solution space – Stream Reasoning “alla PoliMi”Efficiency of C-SPARQL Query Evaluation§  window based selection of C-SPARQL outperforms the standard FILTER based selection Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 51. The solution space – Stream Reasoning “alla PoliMi”Efficiency of C-SPARQL Query Evaluation§  C-SPARQL Algebra allows to push of filters and projections Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 52. The solution space – Stream Reasoning “alla PoliMi”High Throughputs of C-SPARQL Engine Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 53. The solution space – Stream Reasoning “alla PoliMi”Incremental Materialization evaluation§  base-line: re-computing the materialization from scratch§  state-of-the-art (materialized view incremental maintenance)§  PoliMi’s incremental stream approach [ESWC2010] % of the materialization changed when the window slides Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 54. The solution space – Stream Reasoning “alla PoliMi”Incremental Maintenance and Query Latency§  comparison of the average time needed to answer a C-SPARQL query using •  backward reasoner •  the naive approach of re-computing the materialization •  PoliMi’s incremental-stream approach 20 15 10 ms. 5 0 forward  reasoning naive  approach incremental-­‐stream query 5,82 Backward reasoning 1,61 1,61 materialization 0 15,91 0,28 Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 55. The solution spaceStream Reasoning Community Achievements§  RDF Stream data type •  Adopted by most of the research groups active on Stream Reasoning •  Alternative solution based on two time stamps used in eTalis§  Continuous query language •  C-SPARQL was extended by the community •  Alternative solutions have been studied –  without FROM STREAM clause [CQUELS] –  oriented to complex event processing [2]§  Reasoning •  Data-driven for RDFS++ [ESCW2010] •  Goal-driven for temporal logics (eTalis) [2] •  time-decaying logic programs [26]. •  Inductive reasoning [IEEE-IS2010]§  Implementation Experiences •  C-SPARQL Engine •  eTalis / EP-SPARQL •  CQUELS •  S2R Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 56. The solution spaceStream Reasoning next steps§  Scientific •  Notions of soundness and completeness •  More expressive reasoning –  with minor loss in throughput –  and predictable loss on scalability •  Dealing with incomplete & noisy data •  Parallelization and distribution of the processing§  Technical •  Prove effectiveness and efficacy in specific application domains •  Better integrate continuous semantics with Linked Data •  Design and develop a software framework to simplify stream reasoning application development§  Organizational •  Standardaze RDF Stream, C-SPARQL, Streaming Linked Data, etc. Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 57. The solution spaceWrap-up of Stream Reasoning Types of orders Combinations data management Expensive to enforce Order-aware Cheap to enforce Natural Stream reasoning No ordering Scalable reasoning Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 57
  • 58. The solution spaceTop-k reasoning Types of orders Combinations data management Expensive to enforce Order-aware Top-k Cheap to enforce Reasoning Natural Stream reasoning No ordering Scalable reasoning Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 58
  • 59. The solution spaceTop-k reasoning approach§  In traditional reasoning, ranking of results is normally considered a task that increase the hopelessness of scaling inference to massive data set§  Top-k reasoning should, instead, overcome such a common practice and interleave ordering and reasoning§  W.r.t. the running example, top-k reasoning should allow to efficiently •  compute which are the top-k social media users, who are well-known to lead discussions on fashion-related topics and are closest to the requester current location. Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 60. The solution spaceTop-k reasoning attempts§  SoftFacts [60] •  an ontology-mediated top-k information retrieval system over relational databases§  SparqlRank[13] •  adds order to SPARQL algebra as a first class citizen and experimentally shows the performance gain§  AnQL [41] •  extends SPARQL to querying RDFS annotated by bounded lattice (and thus comes with a partial or- dering).§  Notion of exact top-k closure of an ontology w.r.t. a query and a scoring function [53] Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 61. The solution spaceTop-k queries in SPARQL 1.1§  Retrieve the best 10 offers ordered by a function of user ratings of the product and offer price:   SELECT  ?product  ?offer     (g1(?avgRat1)  +  g2(?avgRat2)  +  g3(?price)  AS  ?score)   WHERE  {     ?product  hasAvgRat1  ?avgRat1  .   ?product  hasAvgRat2  ?avgRat2  .   ?product  hasName  ?name  .   ?product  hasOffers  ?offer  .   ?offer  hasPrice  ?price     }   ORDER  BY  DESC  (?score)     LIMIT  10  §  Slow = tens of seconds on 5M (could be improved to milliseconds) Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 62. The solution space - Top-k queries in SPARQL 1.1Challenges§  Adapting SQL optimizations to SPARQL is not straightforward: •  Different algebra •  Different cost of data access in native RDF triplestores –  Sorted access is slow, random access is fast •  Additional optimization dimensions –  Pushing the evaluation of BGP in the storage§  Research tasks •  New algebra for SPARQL where order is a first class citizen •  new algorithms, and •  optimization techniques Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 63. The solution space - Top-k queries in SPARQL 1.1The SPARQL-Rank algebra§  Extends the standard SPARQL algebra§  Ranked set of mappings: set of mappings augmented with an order relation New Extended EQUIVALENC OPERATORS ES Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 64. The solution space – SPARQL-Rank algebraThe new Rank Operator F (p1, p2)= ?p1 + ?p2 ?x ?y ?p1 ?p2 ?x ?y ?p1 ?p2 Fp1 µ1 1 8 0.8 0.8 ρp1 µ1 1 8 0.8 0.8 1.8 µ2 3 3 0.3 0.6 µ3 3 4 0.4 0.6 1.4 µ3 3 4 0.4 0.6 µ2 3 3 0.3 0.6 1.3 Ω ρp1(Ω ) Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 64
  • 65. The solution space – SPARQL-Rank algebraThe redefined Join Operator ?x ?y ?p1 ?p2 Fp1 ?x ?z ?p2 Fp2 µ1 1 8 0.8 0.8 1.8 µ4 1 9 0.8 1.8 µ3 3 4 0.4 0.6 1.4 µ5 3 0 0.6 1.6 µ2 3 3 0.3 0.6 1.3 Ωp1 Ω’p2 ?x ?y ?z ?p1 ?p2 Fp1Up 2 µ1 U µ4 1 8 9 0.8 0.8 1.6 µ3 U µ5 3 4 0 0.4 0.6 1.0 µ2 U µ5 3 3 0 0.3 0.6 0.9 Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 65
  • 66. The solution space – SPARQL-Rank algebraRank Join Algorithms§  Different algorithms based on available access in the inputs: RankJoin (a) •  Hash Rank-Join RankJoin sortedAccess sortedAccess –  e.g. HRJN [Ilyas2004] (a) RankSequence sortedAccess sortedAccess (b) RankSequence sortedAccess randomAccess (b) •  Random Access Rank-Join RA-RankJoin sortedAccess randomAccess –  e.g. RA-HRJN [Ilyas2004] (c) RA-RankJoin RankJoin sortedAccess sortedAccess randomAccess randomAccess (c) (a) sortedAccess sortedAccess randomAccess randomAccess sortedAccess sortedAccess •  RankSequence (e,g, RSEQ) RankSequence –  Minimum sorted access (b) –  Leverages random access sortedAccess randomAccess 2 ] SWC201 EW [I RA-RankJoin N Trento, Italy, 6.11.2012 Emanuele(c) Della Valle - http://streamreasoning.org/
  • 67. The solution space – SPARQL-Rank algebraThe new Algebraic Equivalences Split Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 68. The solution space – SPARQL-Rank algebraThe new Algebraic Equivalences Interleave Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 69. The solution space – SPARQL-Rank algebraPlanning Strategies§  Apply algebraic equivalences§  Result: three possible strategies 1. Rank of BGPs 2. Interleaved 3. Rank Join Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 70. The solution space – SPARQL-Rank algebraPlanning Strategies: rank of BGPs (ROB)§  Substitute the monolithic scoring function with a number of incremental rank operators (rho) ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score SLICE [0,10] SLICE [0,10] SLICE [0,10] ORDER Join [?score] ?pr = ?pr RankJoin g3(?p1) EXTEND ?pr = ?pr[?score =g1(?a1)+g2(?a2)+g3(?p1)] RankJoin g2(?a2) ?pr = ?pr g1(?a1) ?pr hasA1 ?a1. ?pr hasA2 ?a2 . g3(?p1) g1(?a1) ?pr hasN ?n . ?pr hasA1 ?a1 . ?pr hasN ?n . ?pr hasO ?of . ?pr hasO ?of . ?pr hasO ?of . ?of hasP1 ?p1 ?of hasP ?p1. ?of hasP ?p1 . ?pr hasA1 ?a1 . ?pr hasA2 ?a seqScan (a) (b) (a) Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 71. The solution space – SPARQL-Rank algebra Planning Strategies: Interleaved (INTER) §  Separate the pattern in two groups: •  Triple patterns that influence the ranking •  Triple patterns that don’t influence the rankingof, ?score ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score SLICE [0,10] SLICE [0,10]E [0,10] SLICE [0,10] SLICE [0,10] ORDER Join [?score] ?pr =Sequence ?pr p1) RankJoin ?pr = ?pr EXTEND g3(?p1) ?pr = ?pr [?score =g1(?a1)+g2(?a2)+g3(?p1)] RankJoin g3(?p1) ?prg2(?a2)?n ?pr h hasN a1) ?pr = ?pr seqScan ?pr hasA1 ?a1. g1(?a1) ?pr hasA2 ?a2 . g3(?p1) g1(?a1) ?pr hasN ?n . hasN?pr .hasA1 ?a1 . ?pr hasN ?n . ?pr ?n ?pr hasA1 ?a1 . ?of hasP1 ?p1 hasO?pr hasO ?of . ?of hasP1 ?p1 . ?pr ?of . ?pr hasO ?of ?pr hasO ?of . ?of hasP1 ?p1 ?of hasP ?p1. ?of hasP ?p1 . ?pr hasA1 ?a1 . ?pr hasA2 ?a2 . can orderScan_a1 seqScana) (a) (b) (b) (c) Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 72. The solution space – SPARQL-Rank algebraPlanning Strategies: Rank-Join (RJ)§  Split into one pattern for each ranking criterion§  Use the most appropriate join based on type of access ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score ?pr, ?of, ?score SLICE [0,10] SLICE [0,10] SLICE [0,10] SLICE [0,10] ORDER Join ORDER [?score] Join ?pr ?pr = [?score] RankJoin ?pr = ?pr RankJoin EXTEND EXTEND =g1(?a1)+g2(?a2)+g3(?p1)] ?pr = ?pr ?pr = ?pr [?score [?score =g1(?a1)+g2(?a2)+g3(?p1)] RankJoin RankJoin g ?pr ?pr hasN ?n . g2(?a2)2(?a2) hasN ?n . ?pr hasA1 ?a1. ?pr = ?pr ?pr = ?pr ?pr hasA1 ?a1. ?pr hasA2 ?a2 . ?pr hasA2 ?a2 . g3(?p1) ?pr hasN ?n . g3(?p1) g1(?a1) g1(?a1) ?pr hasN ?n . ?pr hasO ?of . ?pr hasO ?of . ?pr hasO ?of . ?pr hasO ?of . ?of hasP ?p1. ?of hasP ?p1. ?of hasP ?p1 . ?of hasP ?p1 ?pr hasA1 ?pr hasA1 ?a1 . ?pr hasA2 hasA2 ?a2 . . ?a1 . ?pr ?a2 . (a) (a) (b) (b) Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 73. The solution space – SPARQL-Rank algebraExperimental evidences of performance improvements§  Example query, 5M triples dataset§  Assumption: availability of sorted access indexes Two orders of magnitude better Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 74. The solution space – SPARQL-Rank algebraExperimental evidences of performance improvements§  Benchmark: 8 queries from on an extension of BSBM Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 75. The solution spaceWrap-up of Top-k Reasoning Types of orders Combinations data management Expensive to enforce Order-aware Top-k Cheap to enforce Reasoning Natural Stream reasoning No ordering Scalable reasoning Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 75
  • 76. The solution spaceFull-fledge Order-aware reasoning Types of orders Combinations Order-aware data management reasoning Expensive to enforce Order-aware Top-k Cheap to enforce Reasoning Natural Stream reasoning No ordering Scalable reasoning Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 76
  • 77. The solution spaceFull-fledge Order-aware reasoning§  In Full-fledged order-aware reasoning, data- and query-driven inference methods have to deal with combinations of natural, cheap to enforce and expensive to enforce type of orders. •  the naive assumption of independence of orderings would have to be relaxed •  theories and methods, which exploit mutual relationships between the three type of orders, have to be rethought§  Considering our running example, methods implementing order-aware reasoning are the only ones able to answer to the query •  Which users of social media, currently leading popular discussions on fashion- related topics, are closest to my current location? What are they saying about the shopping district nearby? Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 78. The solution spaceFull-fledge Order-aware reasoning§  State-of-the-art •  None§  Promising work •  The Answer Set Programming (ASP) community has recently proposed an streaming algorithm for ASP [25] that 1.  ranks the constants referring to domain elements and, 2.  fetch them increasing the domain sizes until an answer set is found.§  Challenges •  theoretical framework that unifies and generalises those defined for stream reasoning and top-k reasoning •  designing and test scalable data- and query-driven methods that allows for efficient answering of queries that involve all types of orders Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 79. The solution spaceWrap-up of Top-k Reasoning Types of orders Combinations Order-aware data management reasoning Expensive to enforce Order-aware Top-k Cheap to enforce Reasoning Natural Stream reasoning No ordering Scalable reasoning Types of reasoning No reasoning Data-driven Query-driven Combinations Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 79
  • 80. ReferencesMy papers[IEEE-IS2009] E. Della Valle, S. Ceri, F. van Harmelen, D. FenselIts a Streaming World! Reasoning upon Rapidly Changing Information.IEEE Intelligent Systems 24(6): 83-89 (2009)[EDBT2010] D.F. Barbieri, D.Braga, S. Ceri and M. Grossniklaus.An Execution Environment for C-SPARQL Queries. EDBT 2010[WWW2009] D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus:C-SPARQL: SPARQL for continuous querying. WWW 2009: 1061-1062[IEEE-IS2010] D. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp, A.Rettinger, H.Wermser: Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics IEEEIntelligent Systems, 30 Aug. 2010.[JWS2012] M. Balduini; I.Celino; E. Della Valle; D.DellAglio; Y. Huang; T. Lee; S. Kim; V. Tresp:BOTTARI: an Augmented Reality Mobile Application to deliver Personalized and Location-basedRecommendations by Continuous Analysis of Social Media Streams. JWS. 2012. IN PRESS.[ESWC2010] D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus.Incremental Reasoning on Streams and Rich Background Knowledge. ESWC 2010[SWJ2012] E. Della Valle, S.Schlobach, M. Krötzsch, A. Bozzon, S. Ceri, I. Horrocks.Order Matters! Harnessing a World of Orderings for Reasoning over Massive Data. IN PRESS[ISWC2012] S. Magliacane, A. Bozzon, E. Della Valle.Efficient Execution of Top-k SPARQL Queries. ISWC 2012. IN PRESS Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 81. Downloads§  C-SPARQL Engine (no reasoning support) •  A ready to go pack for eclipse –  http://streamreasoning.org/download •  Source code available on request§  SPARQL-Rank Engine (ARQ-Rank) •  Source code and experimental data –  http://sparqlrank.search-computing.org/ Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/
  • 82. Thank You!Any questions? emanuele.dellavalle@polimi.it Keep an eye on http://www.streamreasoning.org There’s much more to come!Trento, Italy, 6.11.2012 Emanuele Della Valle - http://streamreasoning.org/ 82