Stream Reasoning
http://streamreasoning.org/

It's a Streaming World!
Reasoning upon Rapidly
Changing Information
Emanuele...
Agenda
Stream Reasoning in a nutshell
A sample of Stream Reasoning investigation: IMaRS
Conclusions

Emanuele Della Valle ...
Agenda
Stream Reasoning in a nutshell
 It's a streaming world
 Requirements
 State-of-the-art solutions
 Stream Reason...
It's a streaming World!

1/2

 Off-shore oil operations

 Smart Cities

 Global Contact Center

 Social networks

 Ge...
It's a streaming World!

2/2

 What is the expected time to failure when that
turbine's barring starts to vibrate as
dete...
Requirements

1/8

A system able to answer those queries must be able to
 handle massive datasets

• A typical oil produc...
Requirements

2/8

A system able to answer those queries must be able to
 process data streams on the fly

• The sensors ...
Requirements

3/8

A system able to answer those queries must be able to
 cope with heterogeneous dataset

• The sensors ...
Requirements

4/8

A system able to answer those queries must be able to
 cope with incomplete data

• 10s of sensors and...
Requirements

5/8

A system able to answer those queries must be able to
 cope with noisy data

• Sensor out-of-operating...
Requirements

6/8

A system able to answer those queries must be able to
 provide reactive answers

• detection of danger...
Requirements

7/8

A system able to answer those queries must be able to
 support fine-grained information access
• Ident...
Requirements

8/8

A system able to answer those queries must be able to
 integrate complex domain models of
• operationa...
Requirements (wrap up)
A system able to answer those queries must be able to
 handle massive datasets
 process data stre...
What are data streams anyway?
 Formally:

• Data streams are unbounded sequences of time-varying data
elements

time

 L...
The continuous nature of streams
 The nature of streams requires a
paradigmatic change*
• from persistent data

– to be s...
Continuous Semantics
 Continuous queries registered over streams that
are observed trough windows
window

Dynamic
System
...
DSMS/CEP State of the Art
 Gianpaolo Cugola, Alessandro Margara: Processing
flows of information: From data stream to com...
Existing solutions

Requirement
massive datasets
data streams
heterogeneous dataset
incomplete data
noisy data
reactive an...
Ontology Based Data Access
 Given ontology O and query Q, use O to rewrite Q
as Q’ so that, for any set of ground facts A...
Existing solutions

Requirement

DSMS
CEP

OBDA

massive datasets
data streams
heterogeneous dataset
incomplete data
noisy...
Research Question
 is it possible to make sense in real time of
multiple, heterogeneous, gigantic and inevitably noisy
an...
Stream Reasoning feasibility (intuition)
 Many relevant reasoning methods are not able to
deal with high frequency data s...
Sub-research questions
 Is it possible model at semantic level
heterogeneous data streams, continuous queries, and
contin...
Findings

Model data stream at semantic level

1/2

 State of the art: RDF

• It allows to make statements about resource...
Findings

Model data stream at semantic level

2/2

 State of the art: RDF
 Contribution: RDF Stream

• Unbound sequence...
Findings

Adding continuous semantics to SPARQL
Who are the opinion makers? i.e., the users who are
likely to influence th...
Findings

Adding continuous semantics to SPARQL
Who are the opinion makers? i.e., the users who are
Query registration
RDF...
Findings

Cope with the noisy and incomplete data
 Noise is reduced using DSMS techniques
 Deductive stream reasoning co...
Findings

Practical cases
 10+ deployments in Sensor Networks,
Social media analytics and Cloud Computing, e.g.
BOTTARI

...
Findings

Optimize for reactive answers
 C-SPARQL time window-based selections outperform
SPARQL filter-based selections
...
Agenda
Stream Reasoning in a nutshell


It's a streaming world



Requirements



State-of-the-art solutions



Stream...
The problem
 Reasoning upon rapidly changing information
 Example
 Context
Social Media Analytics: a set of methods and...
The problem
How many posts have discussed A in the last 40 min?
Enter
Exit
Time window window

Explicitly in
window

10:00...
Is it an hard problem?
 In the database community it is known as the
incremental view maintenance problem
 The state of ...
ms

Is it an hard problem?
rematerialize the view
after each update

Break-even point

1000
incremental
view maintenance
1...
ms

Is it an hard problem?
rematerialize the view
after each update

Break-even point

1000
incremental
view maintenance
1...
ms

What is the target behaviour?
rematerialize the view
after each update

Break-even point

1000
incremental
view mainte...
Observation
 DRed works with random insertions and deletions
 In a streaming setting, when a triple enters the window,
g...
Optimized Stream Reasoning (IMaRS)
 Idea:

• add an expiration time to each triple and
• use an hash table to index tripl...
The problem
How many posts have discussed A in the last 40 min?
Enter
Exit
Time window window
10:00

AB

Explicitly in
wi...
The problem
How many posts have discussed A in the last 40 min?
Enter
Exit
Time window window
10:00

AB

Explicitly in
wi...
The problem
How many posts have discussed A in the last 40 min?
Enter
Exit
Time window window
10:00

AB

Explicitly in
wi...
The problem
How many posts have discussed A in the last 40 min?
Enter
Exit
Time window window
10:00

Explicitly in
window
...
The problem
How many posts have discussed A in the last 40 min?
Enter
Exit
Time window window
10:00

Explicitly in
window
...
The problem
How many posts have discussed A in the last 40 min?
Enter
Exit
Time window window
10:00

Explicitly in
window
...
Experimental results


Re-materialize after each window slide



Use DRed



IMaRS

% of deletions w.r.t. the content o...
Implication of the experimental results
 comparison of the average time needed to answer
a C-SPARQL query, when 2% of the...
Agenda
Stream Reasoning in a nutshell


It's a streaming world



Requirements



State-of-the-art solutions



Stream...
Scholarly impact
BOTTARI
140

Inductive SR
120

# of citations

IMaRS
100

C-SPARQL

80
60
40

SR Vision
20
0
2009

2010

...
Recent Work

Benchmarks are needed
 Stream Reasoning benchmarks are appearing

• SRBench Ying Zhang, Minh-Duc Pham, Óscar...
Recent Work

Supporting industrial take up
 W3C's RDF stream processing Community Group
• http://www.w3.org/community/rsp...
Next step

Generalizing from time to any ordering
 Observation: order reflects recency, relevance, …
Continuous top-k Q/A...
Credits
Politecnico di Milano’s colleagues
 Vision: Prof. S. Ceri
 C-SPARQL: M. Balduini, D. Barbieri, D. Braga, S. Ceri...
Stream Reasoning
http://streamreasoning.org/

Thank you for paying
attention!
Any question?
Emanuele Della Valle
emanuele....
Upcoming SlideShare
Loading in...5
×

It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

691

Published on

Reasoning on rapidly chancing information requires: a) semantic models for representing both data streams and continuous querying/reasoning tasks, and b) reasoning algorithms optimised for continuous reactive query-answering. This talk presents applications cases from which Stream Reasoning requirements were elicited, it briefly covers the findings of 5 year of research, it presents an optimised algorithm for Incremental Reasoning on RDF Streams (IMaRS), and offers an outlook on future research opportunities.

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total Views
691
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
33
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide
  • Èpossibile dare un senso in tempo reale a multipli stream di datieterogenei,enormiedinevitabilmenterumorosi e incompleti per supportare I processidecisionali di un gran numero di utenti?
  • https://twitter.com/BarackObama/status/266031293945503744
  • It's a Streaming World! Reasoning upon Rapidly Changing Information (Milano, 14-2-2014)

    1. 1. Stream Reasoning http://streamreasoning.org/ It's a Streaming World! Reasoning upon Rapidly Changing Information Emanuele Della Valle emanuele.dellavalle@polimi.it
    2. 2. Agenda Stream Reasoning in a nutshell A sample of Stream Reasoning investigation: IMaRS Conclusions Emanuele Della Valle - http://streamreasoning.org/ 3
    3. 3. Agenda Stream Reasoning in a nutshell  It's a streaming world  Requirements  State-of-the-art solutions  Stream Reasoning research questions  Example of findings A sample of Stream Reasoning investigation: IMaRS Conclusions Emanuele Della Valle - http://streamreasoning.org/ 4
    4. 4. It's a streaming World! 1/2  Off-shore oil operations  Smart Cities  Global Contact Center  Social networks  Generate data streams! Emanuele Della Valle - http://streamreasoning.org/ 5
    5. 5. It's a streaming World! 2/2  What is the expected time to failure when that turbine's barring starts to vibrate as detected in the last 10 minutes?  Is public transportation where the people are?  Who are the best available agents to route all these unexpected contacts about the tariff plan launched yesterday?  Who is driving the discussion about the top 10 emerging topics ? E. Della Valle, S. Ceri, F. van Harmelen, D. Fensel It's a Streaming World! Reasoning upon Rapidly Changing Information. IEEE Intelligent Systems 24(6): 83-89 (2009) Emanuele Della Valle - http://streamreasoning.org/ 6
    6. 6. Requirements 1/8 A system able to answer those queries must be able to  handle massive datasets • A typical oil production platform is equipped with about 400.000 sensors • Telecom data is the most pervasive data source in urban are, in Milano there are 1.8 million mobile users • A global contact centre of a Telecom operator counts 500 millions of clients • Facebook alone has 1.1 billion of active users Emanuele Della Valle - http://streamreasoning.org/ 7
    7. 7. Requirements 2/8 A system able to answer those queries must be able to  process data streams on the fly • The sensors on typical oil production platform generates 10,000 observations per minute with peaks of 100,000 o/m • The mobile users in Milano generates 20,000 call/sms/data connections per minute with peaks of 80,000 c/m • A global contact centre receives 10,000 contacts per minute with peaks of 30,000 c/m • Facebook, as of May 2013, observes 3 millions "I like" per minute Emanuele Della Valle - http://streamreasoning.org/ 8
    8. 8. Requirements 3/8 A system able to answer those queries must be able to  cope with heterogeneous dataset • The sensors on typical oil production have been deployed over 10 years by 10s of different producers • Tens of data sources are normally needed to make sense of an urban phenomena • A global contact centre consists in 100s of offices owned by different subsidiary companies engaged yearly • Each social network has its own data model, APIs, … Emanuele Della Valle - http://streamreasoning.org/ 9
    9. 9. Requirements 4/8 A system able to answer those queries must be able to  cope with incomplete data • 10s of sensors and networking links broke down daily • Coverage is incomplete • Only standard cases are covered by fully machine processable data records 100s of contacts per minute are manage ad-hoc • Conversations happen outside the social networks, too! Emanuele Della Valle - http://streamreasoning.org/ 10
    10. 10. Requirements 5/8 A system able to answer those queries must be able to  cope with noisy data • Sensor out-of-operating range • Faulty sensors • Agents misunderstand, get tired, … • Irony, sarcasm, … Emanuele Della Valle - http://streamreasoning.org/ 11
    11. 11. Requirements 6/8 A system able to answer those queries must be able to  provide reactive answers • detection of dangerous situations must occur within minutes • recommendations to citizens must be performed in few seconds • routing a contact through each step of the decision tree must take less than a second • Search autocompleting may need to be updated every few minutes Emanuele Della Valle - http://streamreasoning.org/ 12
    12. 12. Requirements 7/8 A system able to answer those queries must be able to  support fine-grained information access • Identify a turbine among thousands • Locate a bus among thousands • Contact an agent among thousands • Identify an opinion maker among thousands of influencers for a topic Emanuele Della Valle - http://streamreasoning.org/ 13
    13. 13. Requirements 8/8 A system able to answer those queries must be able to  integrate complex domain models of • operational and control process • various city aspects • contact management, contract types, agent skills, contactor profiles, … • topics, user profiles, … Emanuele Della Valle - http://streamreasoning.org/ 14
    14. 14. Requirements (wrap up) A system able to answer those queries must be able to  handle massive datasets  process data streams on the fly  cope with heterogeneous dataset  cope with incomplete data  cope with noisy data  provide reactive answers  support fine-grained information access  integrate complex domain models Emanuele Della Valle - http://streamreasoning.org/ 15
    15. 15. What are data streams anyway?  Formally: • Data streams are unbounded sequences of time-varying data elements time  Less formally: • an (almost) “continuous” flow of information  Assumption • recent information is more relevant as it describes the current state of a dynamic system Emanuele Della Valle - http://streamreasoning.org/ 16
    16. 16. The continuous nature of streams  The nature of streams requires a paradigmatic change* • from persistent data – to be stored and queried on demand – a.k.a. one time semantics • to transient data – to be consumed on the fly by continuous queries – a.k.a. continuous semantics * This paradigmatic change first arose in DB community in the late '90s Emanuele Della Valle - http://streamreasoning.org/ 17
    17. 17. Continuous Semantics  Continuous queries registered over streams that are observed trough windows window Dynamic System input streams Registered Continuous Query Emanuele Della Valle - http://streamreasoning.org/ streams of answer 18
    18. 18. DSMS/CEP State of the Art  Gianpaolo Cugola, Alessandro Margara: Processing flows of information: From data stream to complex event processing. ACM Comput. Surv. 44(3): 15 (2012)  Content • Type of models compared – – – – Functional and processing Deployment and interactions Data, Time, and Rule Language • # of systems surveyed: – Academic: 24 – Industrial: 9 – Total: 33 • To learn more: – http://home.dei.polimi.it/margara/papers/survey.pdf Emanuele Della Valle - http://streamreasoning.org/ 19
    19. 19. Existing solutions Requirement massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models Emanuele Della Valle - http://streamreasoning.org/ DSMS CEP 20
    20. 20. Ontology Based Data Access  Given ontology O and query Q, use O to rewrite Q as Q’ so that, for any set of ground facts A contained in multiple databases: • answer(Q,O,A) = answer(Q’,,A) – The answer of the query Q using the ontology O for any set of ground facts A is equal to answer of a query Q’ without considering the ontology O  Use mapping M to map Q’ to multiple SQL queries to the various databases O Q Rewrite M Q’ Map SQL Emanuele Della Valle - http://streamreasoning.org/ A answer 21
    21. 21. Existing solutions Requirement DSMS CEP OBDA massive datasets data streams heterogeneous dataset incomplete data noisy data reactive answers fine-grained information access complex domain models Emanuele Della Valle - http://streamreasoning.org/ 22
    22. 22. Research Question  is it possible to make sense in real time of multiple, heterogeneous, gigantic and inevitably noisy and incomplete data streams in order to support the decision process of extremely large numbers of concurrent user? Emanuele Della Valle - http://streamreasoning.org/ 23
    23. 23. Stream Reasoning feasibility (intuition)  Many relevant reasoning methods are not able to deal with high frequency data streams  However, trade-off exists between the complexity of the reasoning method and the frequency of the data stream 1 Hz NEXPTIME Abstraction PTIME Selection Interpretation AC0 Complexity DL Reasoning DL-Lite Semantic Streams Raw Stream Processing Complexity vs. Dynamics Querying Re-writing 104 Hz Change Frequency Heiner Stuckenschmidt, Stefano Ceri, Emanuele Della Valle, Frank van Harmelen: Towards Expressive Stream Reasoning. Proceedings of the Dagstuhl Seminar on Semantic Aspects of Sensor Networks, 2010. Emanuele Della Valle - http://streamreasoning.org/ 24
    24. 24. Sub-research questions  Is it possible model at semantic level heterogeneous data streams, continuous queries, and continuous reasoning tasks?  Does the ordered nature of data streams and the possibility to forget old enough information allow to optimize continuous querying and continuous reasoning tasks so to provide reactive answers to large number of concurrent users without forsaking correctness or completeness?  Can Semantic Web and Machine Learning technologies be jointly employed to cope with the noisy and incomplete nature of data streams?  Are there practical cases where processing data stream at semantic level is the best choice? Emanuele Della Valle - http://streamreasoning.org/ 25
    25. 25. Findings Model data stream at semantic level 1/2  State of the art: RDF • It allows to make statements about resources in the form of subject-predicate-object expressions – In RDF terminology triples – E.g. @BarakObama posts "Four more years" subject predicate object • A collection of RDF statements represents a labelled, directed graph – In RDF terminology a graph – E.g., the tweet above by Barak Obama is connected to - 800,000+ twitter user profiles via retweets 300,000+ twitter user profiles favorite …  Contribution: RDF stream Emanuele Della Valle - http://streamreasoning.org/ 26
    26. 26. Findings Model data stream at semantic level 2/2  State of the art: RDF  Contribution: RDF Stream • Unbound sequence of time-varying triples • each represented by a pair made of an RDF triple and its timestamp … @BarakObama 2012 posts "Four more years", 8:16PM 6 Nov 2012 @Alice posts "RT: Four more years", 8:17PM 6 Nov subject predicate … object timestamp D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, M. Grossniklaus: Querying RDF streams with C-SPARQL. SIGMOD Record 39(1): 20-26 (2010) Emanuele Della Valle - http://streamreasoning.org/ 27
    27. 27. Findings Adding continuous semantics to SPARQL Who are the opinion makers? i.e., the users who are likely to influence the behavior their followers REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource } FROM STREAM <http://…> [RANGE 30m STEP 5m] WHERE { ?opinionMaker ?opinion ?resource . ?follower sioc:follows ?opinionMaker. ?follower ?opinion ?resource. FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker) ) } HAVING ( COUNT(DISTINCT ?follower) > 3 ) Emanuele Della Valle - http://streamreasoning.org/ 28
    28. 28. Findings Adding continuous semantics to SPARQL Who are the opinion makers? i.e., the users who are Query registration RDF Stream added likely to influence the behavior their followers as (for continuous execution) new ouput format REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS CONSTRUCT { ?opinionMaker sd:about ?resource } clause FROM STREAM FROM STREAM <http://…> [RANGE 30m STEP 5m] WHERE { WINDOW ?opinionMaker ?opinion ?resource . ?follower sioc:follows ?opinionMaker.Builtin to access ?follower ?opinion ?resource. timestamps FILTER ( cs:timestamp(?follower) > cs:timestamp(?opinionMaker) ) } HAVING ( COUNT(DISTINCT ?follower) > 3 ) Emanuele Della Valle - http://streamreasoning.org/ 29
    29. 29. Findings Cope with the noisy and incomplete data  Noise is reduced using DSMS techniques  Deductive stream reasoning copes with incompleteness deducing implicit facts  Inductive stream reasoning copes "irrepairable" incompleteness inducing missing facts D.F. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang, V. Tresp, A. Rettinger, H. Wermser: Deductive and Inductive Stream Reasoning for Semantic Social Media Analytics. IEEE Intelligent Systems 25(6): 32-41 (2010) Emanuele Della Valle - http://streamreasoning.org/ 30
    30. 30. Findings Practical cases  10+ deployments in Sensor Networks, Social media analytics and Cloud Computing, e.g. BOTTARI City Data Fusion Winner of Semantic Web Challenge 2011 Winner of IBM faculty award 2013 M. Balduini, I. Celino, D. Dell’Aglio, E. Della Valle, Y. Huang, T. Lee, S.-H. Kim, V. Tresp: BOTTARI: An augmented reality mobile application to deliver personalized and location-based recommendations by continuous analysis of social media streams. J. Web Sem. 16: 33-41 (2012) Emanuele Della Valle - http://streamreasoning.org/ 31
    31. 31. Findings Optimize for reactive answers  C-SPARQL time window-based selections outperform SPARQL filter-based selections  Incremental Reasoning on RDF streams (IMaRS): new reasoning algorithm optimized for reactive query answering D.F. Barbieri, D. Braga, S.Ceri, E. Della Valle, M. Grossniklaus: Incremental Reasoning on Streams and Rich Background Knowledge. ESWC (1) 2010: 1-15 D. Dell'Aglio, E. Della Valle: Incremental Reasoning on RDF Streams. In A.Harth, K.Hose, R.Schenkel (Eds.) Linked Data Management, CRC Press 2014, ISBN 9781466582408 Emanuele Della Valle - http://streamreasoning.org/ 32
    32. 32. Agenda Stream Reasoning in a nutshell  It's a streaming world  Requirements  State-of-the-art solutions  Stream Reasoning research questions  Affirmative answers collected in 5 years of investigations A sample of Stream Reasoning investigation: IMaRS  Problem statement  State-of-the art solutions  Proposed solution  Experimental results Conclusions Emanuele Della Valle - http://streamreasoning.org/ 33
    33. 33. The problem  Reasoning upon rapidly changing information  Example  Context Social Media Analytics: a set of methods and technologies for collecting, monitoring, analyzing, summarizing, and visualizing media generated and disseminated in a conversational, and distributed mode among online communities   Real query Which have been the most discussed posts in the last hour?   Simplified version How many posts have discussed A in the last 40 min?   Data A B C A, B, and C represent posts A B means B discusses A A C means C indirectly discusses A NOTE: discusses is transitive property Emanuele Della Valle - http://streamreasoning.org/ 34
    34. 34. The problem How many posts have discussed A in the last 40 min? Enter Exit Time window window Explicitly in window 10:00 AB A BC A B 10:20 10:30 C A C C A C C A C E AE EC A A 10:40 AB 10:50 BC 11:00 AE A B E B E B E C A C C A # Exp. # Inf. 1 B 10:10 Inferred in window A C E C Emanuele Della Valle - http://streamreasoning.org/ SUM 1 1 1 2 2 1 3 2 1 3 1 1 2 1 1 2 0 0 0 35
    35. 35. Is it an hard problem?  In the database community it is known as the incremental view maintenance problem  The state of the art solution is the DRed algorithm • Overestimation of deletion: Overestimates deletions by computing all direct consequences of a deletion. A E B C A C • Rederivation: Prunes those estimated deletions for which alternative derivations (via some other facts in the program) exist. A E B C A C • Insertion: Adds the new derivations that are consequences of insertions  Deletion are twice as expensive as insertions Emanuele Della Valle - http://streamreasoning.org/ 36
    36. 36. ms Is it an hard problem? rematerialize the view after each update Break-even point 1000 incremental view maintenance 100 In a normal DB incremental view maintenance is convenient in most of the cases: • Lot of inserts, few deletions • Small % of deletions w.r.t. the size of the DB 10 % of deletions w.r.t the size of the DB 0 0% 2% 4% 6% 8% Emanuele Della Valle - http://streamreasoning.org/ 10% 12% 14% 37
    37. 37. ms Is it an hard problem? rematerialize the view after each update Break-even point 1000 incremental view maintenance 100 In a streaming setting • The number of inserts are equals to the number of deletions • the data exiting the windows causes large % of deletions w.r.t. % of deletions w.r.t the the size of the window content of the window 10 0 0% 2% 4% 6% 8% Emanuele Della Valle - http://streamreasoning.org/ 10% 12% 14% 38
    38. 38. ms What is the target behaviour? rematerialize the view after each update Break-even point 1000 incremental view maintenance 100 Target behaviour 10 % of deletions w.r.t the content of the window 0 0% 2% 4% 6% 8% Emanuele Della Valle - http://streamreasoning.org/ 10% 12% 14% 39
    39. 39. Observation  DRed works with random insertions and deletions  In a streaming setting, when a triple enters the window, given the size of the window, the reasoner knows already when it will be deleted!  E.g., • if the window is 40 minutes long, and, • it is 10:00, the triple(s) entering now • will exit on 10:40.  Conclusion • deletions are predictable Enter Exit Time window window 10:00 AßB A 10:10 BßC A B 10:20 AßE 10:30 EßC Infer win B C A C A C A C A C A E A A 10:40 AßB 10:50 BßC 11:00 Emanuele Della Valle - http://streamreasoning.org/ Explicitly in window AßE A B E B E B E A E C 40
    40. 40. Optimized Stream Reasoning (IMaRS)  Idea: • add an expiration time to each triple and • use an hash table to index triples by their expiration time  The algorithm 1. deletes expired triples 2. Adds the new derivations that are consequences of insertions annotating each inferred triple with an expiration time (the min of those of the triple it is derived from), and 3. when multiple derivations occur, for each multiple derivation, it keeps the max expiration time. Emanuele Della Valle - http://streamreasoning.org/ 41
    41. 41. The problem How many posts have discussed A in the last 40 min? Enter Exit Time window window 10:00 AB Explicitly in window Inferred in window 10:40 A BC A B # Inf. 1 B SUM 1 10:40 10:40 10:50 10:10 # Exp. C A C 1 1 2 The min of the two expiration times Emanuele Della Valle - http://streamreasoning.org/ 42
    42. 42. The problem How many posts have discussed A in the last 40 min? Enter Exit Time window window 10:00 AB Explicitly in window Inferred in window 10:40 A 10:20 A AE 11:00 E 10:40 10:50 B SUM 1 10:40 BC A B # Inf. 1 B 10:40 10:50 10:10 # Exp. C C A A C 10:40 C 1 1 2 2 1 3 AC remains unchanged Emanuele Della Valle - http://streamreasoning.org/ 43
    43. 43. The problem How many posts have discussed A in the last 40 min? Enter Exit Time window window 10:00 AB Explicitly in window Inferred in window 10:40 A B # Inf. 1 B SUM 1 10:40 10:40 10:50 BC A 10:20 AE 11:00 E 10:40 10:50 10:30 EC 10:10 # Exp. C C B 11:00 E 11:10 10:40 10:50 C A B A A A A C 1 1 2 10:40 2 1 3 11:00 2 1 3 C C AC is inferred with a longer lasting expiration time, the max of the expiration time is used Emanuele Della Valle - http://streamreasoning.org/ 44
    44. 44. The problem How many posts have discussed A in the last 40 min? Enter Exit Time window window 10:00 Explicitly in window Inferred in window 10:40 AB A A 10:20 AE 11:00 E 10:40 10:50 C 10:30 EC C B 11:00 E 11:10 10:40 10:50 C A B 11:00 E 11:10 10:50 A B C A AB SUM 1 10:40 BC 10:40 B # Inf. 1 B 10:40 10:50 10:10 # Exp. A A A A C 1 1 2 10:40 2 1 3 11:00 2 1 3 11:00 2 1 3 C C C This deletion causes no effect Emanuele Della Valle - http://streamreasoning.org/ 45
    45. 45. The problem How many posts have discussed A in the last 40 min? Enter Exit Time window window 10:00 Explicitly in window Inferred in window 10:40 AB A B A 10:20 AE 11:00 E 10:40 10:50 C 10:30 EC C B 11:00 E 11:10 10:40 10:50 C A B 11:00 E 11:10 10:50 A B C 11:00 E 11:10 A C A AB 10:50 BC SUM 1 10:40 BC 10:40 # Inf. 1 B 10:40 10:50 10:10 # Exp. A A A A A C 1 1 2 10:40 2 1 3 11:00 2 1 3 11:00 2 1 3 11:00 2 1 3 C C C C This deletion causes no effect Emanuele Della Valle - http://streamreasoning.org/ 46
    46. 46. The problem How many posts have discussed A in the last 40 min? Enter Exit Time window window 10:00 Explicitly in window Inferred in window 10:40 AB A B A 10:20 AE 11:00 E 10:40 10:50 C 10:30 EC C B 11:00 E 11:10 10:40 10:50 C A B 11:00 E 11:10 10:50 A B C 11:00 E 11:10 A C E C A AB 10:50 BC 11:00 AE SUM 1 10:40 BC 10:40 # Inf. 1 B 10:40 10:50 10:10 # Exp. Emanuele Della Valle - http://streamreasoning.org/ A A A A A C 1 1 2 10:40 2 1 3 11:00 2 1 3 C C 11:00 C 11:00 C Deletion of inferred triples 3 is just 2 look up1 a 2 1 3 0 0 0 47
    47. 47. Experimental results  Re-materialize after each window slide  Use DRed  IMaRS % of deletions w.r.t. the content of the window Emanuele Della Valle - http://streamreasoning.org/ 48
    48. 48. Implication of the experimental results  comparison of the average time needed to answer a C-SPARQL query, when 2% of the content exits the window each time it slides, using • A backward reasoner on the window content • DRed + standard SPARQL on the materialization • IMaRS + standard SPARQL on the materialization Backward DRed +SPARQL Emanuele Della Valle - http://streamreasoning.org/ IMaRS + SPARQL 49
    49. 49. Agenda Stream Reasoning in a nutshell  It's a streaming world  Requirements  State-of-the-art solutions  Stream Reasoning research questions  Affirmative answers collected in 5 years of investigations A sample of Stream Reasoning investigation: IMaRS  Problem statement  State-of-the art solutions  Proposed solution  Experimental results Conclusions Emanuele Della Valle - http://streamreasoning.org/ 50
    50. 50. Scholarly impact BOTTARI 140 Inductive SR 120 # of citations IMaRS 100 C-SPARQL 80 60 40 SR Vision 20 0 2009 2010 2011 year 2012 Emanuele Della Valle - http://streamreasoning.org/ 2013 51
    51. 51. Recent Work Benchmarks are needed  Stream Reasoning benchmarks are appearing • SRBench Ying Zhang, Minh-Duc Pham, Óscar Corcho, JeanPaul Calbimonte – http://www.w3.org/wiki/SRBench • LSBench: Danh Le Phuoc, Minh Dao-Tran, Minh-Duc Pham, Peter A. Boncz, Thomas Eiter, Michael Fink – http://code.google.com/p/lsbench/  What shall we evaluate? • PeCK methodology: Sys Properties -> Challenges -> KPI  How? • Maximise throughput is OK, but do not forsake correctness! T. Scharrenbach, J. Urbani, A. Margara, E. Della Valle, A. Bernstein: Seven Commandments for Benchmarking Semantic Flow Processing Systems. ESWC 2013: 305-319 D. Dell'Aglio, J.-P. Calbimonte, M. Balduini, Ó. Corcho, E. Della Valle: On Correctness in RDF Stream Processor Benchmarking. ISWC (2) 2013: 326-342 Emanuele Della Valle - http://streamreasoning.org/ 52
    52. 52. Recent Work Supporting industrial take up  W3C's RDF stream processing Community Group • http://www.w3.org/community/rsp/ • Goals – – – – – collect use cases cense approaches define RDF stream models define SPARQL extension for RDF stream processing define RESTful interfaces • 52 members from 24 organization (6 industries) • Biweekly phone calls since September 2013 Emanuele Della Valle - http://streamreasoning.org/ 53
    53. 53. Next step Generalizing from time to any ordering  Observation: order reflects recency, relevance, … Continuous top-k Q/A Order-aware reasoning Top-k Q/A Top-k Reasoning Recency DSMS/CEP Stream reasoning Indexes Traditional solutions Scalable reasoning No Yes Types of orders Combinations Relevance … Reasoning Emanuele Della Valle, Stefan Schlobach, Markus Krötzsch, Alessandro Bozzon, Stefano Ceri, Ian Horrocks: Order matters! Harnessing a world of orderings for reasoning over massive data. Semantic Web 4(2): 219-231 (2013) Sara Magliacane, Alessandro Bozzon, Emanuele Della Valle: Efficient Execution of Top-K SPARQL Queries. International Semantic Web Conference (1) 2012: 344-360 Emanuele Della Valle - http://streamreasoning.org/ 54
    54. 54. Credits Politecnico di Milano’s colleagues  Vision: Prof. S. Ceri  C-SPARQL: M. Balduini, D. Barbieri, D. Braga, S. Ceri and M. Grossniklaus  Benchmarking: M. Balduini, D. Dell'Aglio, and A. Margara  SPARQL-RANK: S. Magliacane and A. Bozzon People I directly worked with on the topic  CEFRIEL: I. Celino  Saltlux: T. Lee and S.-H. Kim  SIEMENS: prof. V. Tresp and Y. Huang  U-Innsbruck: prof. D. Fensel  U-Oxford: prof. I. Horrocks and M. Krötzsch  UP-Madrid: Ó. Corcho and J.-P. Calbimonte  U-Mannheim: prof. H. Stuckenschmidt  U-Zurich: prof. A. Bernstein and T. Scharrenbach  VA-Amsterdam: prof. F. van Harmelen, S. Schlobach and J. Urbani The broader research community that showed interest in stream reasoning Emanuele Della Valle - http://streamreasoning.org/ 55
    55. 55. Stream Reasoning http://streamreasoning.org/ Thank you for paying attention! Any question? Emanuele Della Valle emanuele.dellavalle@polimi.it
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×