Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Efficient reverse top k boolean spa... by ieeepondy 209 views
- Low cost Final year Engineering IEE... by Sanjay Shelar 1530 views
- Best keyword cover search by ieeepondy 961 views
- Advanced googlesearching unbranded by Mary McVay 462 views
- Introduction to Information Retriev... by Mounia Lalmas-Roe... 26619 views

3,323 views

Published on

No Downloads

Total views

3,323

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

40

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Efficient Filtering in Publish- Subscribe Systems using BDD Alexis Campailla, Sagar Chaki, Edmund Clarke, Somesh Jha, Helmut Veith Prepared by Nabeel Mohamed 4/16/08 1
- 2. Outline Research problem at hand Content-based Publish-Subscribe Subscription Query Language BDD Semantics BDD Based matching Experimental Results Discussion (Pros and Cons) 2
- 3. Research Problem at Hand Loosely-coupled interactions in publish-subscribe systems allows to build very large scale systems However, filtering techniques used are a major bottleneck Efficiency of the filtering technique plays a major role in scalability Whatever technique we use should be provably correct 3
- 4. Major Contributions A Precise semantics to match messages (events) to subscriptions (subscription queries) Modeling filtering as a satisfiability check in BDD 4
- 5. Roadmap Research problem at hand Content-based Publish-Subscribe Subscription Query Language BDD Semantics BDD Based matching Experimental Results Discussion (Pros and Cons) 5
- 6. Publish-Subscribe Systems Distributed Publisher Subscriber Content Routers subscribe Notify() Subscribe() publish unsubscribe Publisher Notify() Subscriber Unsubscribe() Notify() publish notify Publisher Distributed Subscriber Notify() Subscription Mgmt and Routing 6
- 7. Publish-Subscribe Systems Publishers andSubscribers are loosely coupled ◦ Space decoupled ◦ Time decoupled ◦ Synchronization decoupled Content routers(brokers) form a structured p2p system Scalable Systems 7
- 8. Message (Event) Filtering Filtering ◦ Matching incoming messages (events) generated by Publishers with subscription criteria ◦ A main task of content routers (brokers) – filtering engine Content-based pub-sub systems routes messages (events) based on the content itself Example: Filter Quotes with symbol = Google and offer price < 400 in a Financial ticker. 8
- 9. Example Pub-Sub Systems Stock market feeds ◦ For delivery of financial data such as stock quotes, trade reports, news, etc. to customers ◦ OPRA feed disseminates more than 100,000 quotes/sec Sensor networks Network traffic analysis Transaction log analysis 9
- 10. Desirable Functions of a Filtering Engine Correctness: ◦ Correctly matching incoming messages with subscription criteria Expressiveness: ◦ Rich subscription language Efficiency: ◦ Real time matching Scalability: ◦ Handling a large number of subscriptions Dynamic: ◦ Capability to add and remove subscriptions online 10
- 11. Related Work Most existing systems support only conjunctive subscriptions ◦ GRYPHON ◦ SIENA ◦ Le Subscribe Example: The following subscription requires 27 GRYPHON-like subscriptions while BDD handles it naturally. 11
- 12. Related Work Some systems have higher expressive power at the expense of less efficient filtering. ◦ ELVIN Can we come up with an efficient filtering technique while providing an expressive subscription language? BDD based filtering may be employed in existing systems to improve matching efficiency 12
- 13. Roadmap Research problem at hand Content-based Publish-Subscribe Subscription Query Language BDD Semantics BDD Based matching Experimental Results Discussion (Pros and Cons) 13
- 14. Subscription Query Language The language used to describe subscription criteria or subscriptions Three Subscription Languages of increasing complexity ◦ SiSL – Simple Subscription Language ◦ StSL – Strict Subscription Language ◦ DeSL – Default Subscription Language 14
- 15. Messages and Attributes V = <v1, .., vn> = a finite sequence of attributes Each attribute vi has a type Each attribute vi has a corresponding domain Event schema = 15
- 16. Messages and Attributes A message = an assignment of values to some (not necessarily all) of the attributes Formally, a message is a mapping m such that for each attribute v, either (m does not define v) ≡ A message is total if it defines all attributes in V. 16
- 17. Messages and Attributes – Example 1 Let V = <company, product, price> over the event schema <STR, STR, DBL> Consider the following message: <company> IBM </company> <product>PC AT, 20 Mhz, 256 KB RAM</product> <price>5000</price> This describes a total message m1 where m1(company) = “IBM”, m1(product) = “PC AT, 20 Mhz, 256 KB RAM” and m1(price) = 5000. 17
- 18. Messages and Attributes – Example 2 Consider the following message: <company> IBM </company> <product>PC AT, 20 Mhz, 256 KB RAM</product> This describes a different message m2 which is not total (i.e. partial), since m2(price) = *. 18
- 19. Three Subscription Languages SiSL – Simple Subscription Language ◦ All messages are total StSL – Strict Subscription Language ◦ Messages define all attributes that occur in the query (subscription criteria) ◦ SiSL is a subset of StSL DeSL – Default Subscription Language ◦ All attributes are initialized to default values (e.g. using NULL) ◦ Extends the functionality of SiSL to heterogeneous message formats 19
- 20. Formalizing SiSL Queries (Subscriptions) Atomic formulas Let vbe an attribute in V If and then the formulas v = c, v < c, c < v are atomic formulas. If , atomic formulas are defined similarly. If then the formulas are atomic formulas. ( ≡ substring) 20
- 21. Formalizing SiSL Queries (Subscriptions) Atoms = the set of atomic formulas A Query is a Boolean combination of atomic formulas = the set of attributes occurring in = the set of atomic formulas occurring in 21
- 22. Formalizing SiSL Queries (Subscriptions) Abbreviations 22
- 23. Example: SiSL Query The followingSiSL query matches all messages for 1000 Mhz PCs manufactured by IBM, Dell or Siemens which cost at most $1000. 23
- 24. Formalizing SiSL Queries (Subscriptions) = The instantiation of a query by a message m. Definition: is defined as the query obtained from by replacing all variables for which m(v) ≠ * by m(v). Definition: The SiSL query matches the total message m if evaluates to true. 24
- 25. Formalizing StSL Queries (Subscriptions) StSL (Strict Subscription Language) is generalization of SiSL. Definition: adequacy A message m is adequate for a query , if for all , it holds that m(v) ≠ *. Definition: The query matches m, iff m is adequate for and 25
- 26. Formalizing DeSL Queries (Subscriptions) DeSL (Default Subscription Language) is the most general out of the three. For each attribute vi, there’s a default value Definition: The default extension of m is defined as follows. 26
- 27. Formalizing DeSL Queries (Subscriptions) Definition: The query matches the message m under default semantics if (i.e. evaluates to true) 27
- 28. Roadmap Research problem at hand Content-based Publish-Subscribe Subscription Query Language BDD Semantics BDD Based matching Experimental Results Discussion (Pros and Cons) 28
- 29. BDDs (Binary Decision Diagrams) Notations A = a set of propositional variables = a linear ordering (variable ordering) on A = An ordered BDD over A, whose non-terminal nodes are labeled by variables in A, terminals by 0 or 1. = The Boolean function represented by node v in 29
- 30. Properties of BDDs Each non-terminal node v has two out- edges: low edge and high edge Let a non-terminal node v with label ai has successors at the low and high edges u and w respectively. Then, ≡ Size = # nodes in the BDD 30
- 31. Example: BDD The following BDD represents the Boolean function x AND ( y OR z). The variable ordering is 31
- 32. Shared BDDs (SBDDs) While OBDDs represent one Boolean function, SBDDs represent multiple Boolean functions. SBDD is a collection of component OBDDs respecting same variable ordering. SBDD has a set of output nodes Vo = {o1, …, on} each corresponding to Boolean functions <f1,…, fn> respectively. 32
- 33. SBDDs Every root node of component OBDDS Vo Notation: Denotes the BDD together with its output nodes {o1, …, on} is polynomial time computable from any other shared BDD over A for <f1,…, fn> 33
- 34. Example: Shared BDD Node 1 represents Node 2 represents Node 3 represents 34
- 35. BDD Data Structure A BDD with n nodes is represented as a graph whose vertices are the natural numbers 1,…, n. The adjacency relationship is described by an array of size n. ith element = (low[i], high[i], label[i], value[i]) ◦ low[i] = low successor of i ◦ high[i] = high successor of i ◦ label[i] = label of i ◦ value[i] = used later to store the result of the BDD evaluation corresponding to i. 35
- 36. BDD Evaluation The above algorithm computes the value of each node in under the assignment where = = value of ith component 36
- 37. BDD Evaluation Notice that we can compute the value of Boolean functions associated with each output node in one pass. 37
- 38. BDD Restrictions The idea is to restrict the possible truth assignments such that external constraint f (a Boolean fn over A) evaluates to true under Definition: f-restriction 38
- 39. Roadmap Research problem at hand Content-based Publish-Subscribe Subscription Query Language BDD Semantics BDD Based matching Experimental Results Discussion (Pros and Cons) 39
- 40. Query BDDs Key Idea ◦ Represent many subscription queries by a single shared BDD whose nodes correspond to atomic sub-formulas of the queries. ◦ Messages are matched against queries by simply running EvalBDD on the shared BDD. 40
- 41. Query BDDs , a sequence of queries over the set of attributes V A= , the set of atomic sub-formulas of the queries. is the set of propositional variables such that each atomic sub-formula a in A is assigned a propositional variable = Boolean query obtained by substituting each a with 41
- 42. Example: Query BDDs Let & two subscriptions received Then, = Three atomic sub-formulas => Three propositional variables 42
- 43. Example: Query BDDs Let the variable order be SBDD corresponding to the queries 43
- 44. Query Matching: SiSL Use EvalBDD algorithm for query matching A query Qi is considered matched if the BDD node corresponding to Qi evaluates to 1. Bottom-up evaluation makes sure sub- queries are evaluated only once. 44
- 45. Query Matching: DeSL Same as handling complete messages When a message received, it is extended to a total message before performing the matching. 45
- 46. Query Matching: StSL Recall thata message m matches a subscription Q iff m is adequate for Q and m satisfies Q. Can use a modified EvalBDD to perform faster matching Key Ideas ◦ An undefined atom renders all sub- formulas in which it occurs undefined. ◦ Treat * as new value undefined 46
- 47. Query Matching: StSL MVEvalBDD for StSL is significantly faster than EvalBDD for SiSL 47
- 48. Roadmap Research problem at hand Content-based Publish-Subscribe Subscription Query Language BDD Semantics BDD Based matching Experimental Results Discussion (Pros and Cons) 48
- 49. # Nodes in SBDD vs. # Subscriptions Number of nodes scale almost linearly ◦ High scalability Restriction further reduces node count, minimizing memory requirements 49
- 50. Matching time for SiSL and StSL Time for StSL queries Inputs: Number of subscription queries and message density (how total) Partial messages can be matched quickly. 50
- 51. Roadmap Research problem at hand Content-based Publish-Subscribe Subscription Query Language BDD Semantics BDD Based matching Experimental Results Discussion (Pros and Cons) 51
- 52. Variable Ordering vs. BDD size Variable ordering has a tremendous influence on BDD size. 52
- 53. Pros Introduces a well-formed semantics to describe the matching process in publish-subscribe systems Matching as a satisfiability checking in SBDD allows to incrementally check multiple subscriptions Scalable StSL is more efficient than SiSL 53
- 54. Cons/Improvements Does not describe any heuristics to select node ordering (NP-hard); ◦ Can we order based on the significance of the attributes involved? Does not explore possibility of eliminating redundancies due to semantically related atomic sub-formulas (e.g.: price = 100 and price > 80) (again NP-hard) ◦ Can we further reduce the node count exploiting the semantics without causing side effect? Efficiency of matching is not compared with existing systems 54
- 55. Conclusion Two major contributions ◦ A Precise semantics to match messages to subscriptions ◦ Modeling filtering as a satisfiability check in BDD 55
- 56. Questions 56
- 57. Thank You 57

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment