Efficient Filtering in Pub-Sub Systems using BDD

3,323 views

Published on

Slides prepared based on the paper Efficient Filtering in Publish-Subscribe Systems using BDD by Alexis Campailla, SagarChaki, Edmund Clarke, SomeshJha, Helmut Veith

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,323
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
40
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Efficient Filtering in Pub-Sub Systems using BDD

  1. 1. Efficient Filtering in Publish- Subscribe Systems using BDD Alexis Campailla, Sagar Chaki, Edmund Clarke, Somesh Jha, Helmut Veith Prepared by Nabeel Mohamed 4/16/08 1
  2. 2. Outline  Research problem at hand  Content-based Publish-Subscribe  Subscription Query Language  BDD Semantics  BDD Based matching  Experimental Results  Discussion (Pros and Cons) 2
  3. 3. Research Problem at Hand  Loosely-coupled interactions in publish-subscribe systems allows to build very large scale systems  However, filtering techniques used are a major bottleneck  Efficiency of the filtering technique plays a major role in scalability  Whatever technique we use should be provably correct 3
  4. 4. Major Contributions  A Precise semantics to match messages (events) to subscriptions (subscription queries)  Modeling filtering as a satisfiability check in BDD 4
  5. 5. Roadmap  Research problem at hand  Content-based Publish-Subscribe  Subscription Query Language  BDD Semantics  BDD Based matching  Experimental Results  Discussion (Pros and Cons) 5
  6. 6. Publish-Subscribe Systems Distributed Publisher Subscriber Content Routers subscribe Notify() Subscribe() publish unsubscribe Publisher Notify() Subscriber Unsubscribe() Notify() publish notify Publisher Distributed Subscriber Notify() Subscription Mgmt and Routing 6
  7. 7. Publish-Subscribe Systems  Publishers andSubscribers are loosely coupled ◦ Space decoupled ◦ Time decoupled ◦ Synchronization decoupled  Content routers(brokers) form a structured p2p system Scalable Systems 7
  8. 8. Message (Event) Filtering  Filtering ◦ Matching incoming messages (events) generated by Publishers with subscription criteria ◦ A main task of content routers (brokers) – filtering engine  Content-based pub-sub systems routes messages (events) based on the content itself  Example: Filter Quotes with symbol = Google and offer price < 400 in a Financial ticker. 8
  9. 9. Example Pub-Sub Systems  Stock market feeds ◦ For delivery of financial data such as stock quotes, trade reports, news, etc. to customers ◦ OPRA feed disseminates more than 100,000 quotes/sec  Sensor networks  Network traffic analysis  Transaction log analysis 9
  10. 10. Desirable Functions of a Filtering Engine  Correctness: ◦ Correctly matching incoming messages with subscription criteria  Expressiveness: ◦ Rich subscription language  Efficiency: ◦ Real time matching  Scalability: ◦ Handling a large number of subscriptions  Dynamic: ◦ Capability to add and remove subscriptions online 10
  11. 11. Related Work  Most existing systems support only conjunctive subscriptions ◦ GRYPHON ◦ SIENA ◦ Le Subscribe  Example: The following subscription requires 27 GRYPHON-like subscriptions while BDD handles it naturally. 11
  12. 12. Related Work  Some systems have higher expressive power at the expense of less efficient filtering. ◦ ELVIN  Can we come up with an efficient filtering technique while providing an expressive subscription language?  BDD based filtering may be employed in existing systems to improve matching efficiency 12
  13. 13. Roadmap  Research problem at hand  Content-based Publish-Subscribe  Subscription Query Language  BDD Semantics  BDD Based matching  Experimental Results  Discussion (Pros and Cons) 13
  14. 14. Subscription Query Language  The language used to describe subscription criteria or subscriptions  Three Subscription Languages of increasing complexity ◦ SiSL – Simple Subscription Language ◦ StSL – Strict Subscription Language ◦ DeSL – Default Subscription Language 14
  15. 15. Messages and Attributes V = <v1, .., vn> = a finite sequence of attributes  Each attribute vi has a type  Each attribute vi has a corresponding domain  Event schema = 15
  16. 16. Messages and Attributes  A message = an assignment of values to some (not necessarily all) of the attributes  Formally, a message is a mapping m such that for each attribute v, either (m does not define v) ≡  A message is total if it defines all attributes in V. 16
  17. 17. Messages and Attributes – Example 1  Let V = <company, product, price> over the event schema <STR, STR, DBL>  Consider the following message: <company> IBM </company> <product>PC AT, 20 Mhz, 256 KB RAM</product> <price>5000</price>  This describes a total message m1 where m1(company) = “IBM”, m1(product) = “PC AT, 20 Mhz, 256 KB RAM” and m1(price) = 5000. 17
  18. 18. Messages and Attributes – Example 2  Consider the following message: <company> IBM </company> <product>PC AT, 20 Mhz, 256 KB RAM</product>  This describes a different message m2 which is not total (i.e. partial), since m2(price) = *. 18
  19. 19. Three Subscription Languages  SiSL – Simple Subscription Language ◦ All messages are total  StSL – Strict Subscription Language ◦ Messages define all attributes that occur in the query (subscription criteria) ◦ SiSL is a subset of StSL  DeSL – Default Subscription Language ◦ All attributes are initialized to default values (e.g. using NULL) ◦ Extends the functionality of SiSL to heterogeneous message formats 19
  20. 20. Formalizing SiSL Queries (Subscriptions)  Atomic formulas  Let vbe an attribute in V If and then the formulas v = c, v < c, c < v are atomic formulas. If , atomic formulas are defined similarly. If then the formulas are atomic formulas. ( ≡ substring) 20
  21. 21. Formalizing SiSL Queries (Subscriptions)  Atoms = the set of atomic formulas  A Query is a Boolean combination of atomic formulas  = the set of attributes occurring in  = the set of atomic formulas occurring in 21
  22. 22. Formalizing SiSL Queries (Subscriptions)  Abbreviations 22
  23. 23. Example: SiSL Query  The followingSiSL query matches all messages for 1000 Mhz PCs manufactured by IBM, Dell or Siemens which cost at most $1000. 23
  24. 24. Formalizing SiSL Queries (Subscriptions)  = The instantiation of a query by a message m.  Definition: is defined as the query obtained from by replacing all variables for which m(v) ≠ * by m(v).  Definition: The SiSL query matches the total message m if evaluates to true. 24
  25. 25. Formalizing StSL Queries (Subscriptions)  StSL (Strict Subscription Language) is generalization of SiSL.  Definition: adequacy A message m is adequate for a query , if for all , it holds that m(v) ≠ *.  Definition: The query matches m, iff m is adequate for and 25
  26. 26. Formalizing DeSL Queries (Subscriptions)  DeSL (Default Subscription Language) is the most general out of the three.  For each attribute vi, there’s a default value  Definition: The default extension of m is defined as follows. 26
  27. 27. Formalizing DeSL Queries (Subscriptions)  Definition: The query matches the message m under default semantics if (i.e. evaluates to true) 27
  28. 28. Roadmap  Research problem at hand  Content-based Publish-Subscribe  Subscription Query Language  BDD Semantics  BDD Based matching  Experimental Results  Discussion (Pros and Cons) 28
  29. 29. BDDs (Binary Decision Diagrams)  Notations A = a set of propositional variables = a linear ordering (variable ordering) on A = An ordered BDD over A, whose non-terminal nodes are labeled by variables in A, terminals by 0 or 1. = The Boolean function represented by node v in 29
  30. 30. Properties of BDDs  Each non-terminal node v has two out- edges: low edge and high edge  Let a non-terminal node v with label ai has successors at the low and high edges u and w respectively. Then,  ≡  Size = # nodes in the BDD 30
  31. 31. Example: BDD  The following BDD represents the Boolean function x AND ( y OR z).  The variable ordering is 31
  32. 32. Shared BDDs (SBDDs)  While OBDDs represent one Boolean function, SBDDs represent multiple Boolean functions.  SBDD is a collection of component OBDDs respecting same variable ordering.  SBDD has a set of output nodes Vo = {o1, …, on} each corresponding to Boolean functions <f1,…, fn> respectively. 32
  33. 33. SBDDs  Every root node of component OBDDS Vo  Notation: Denotes the BDD together with its output nodes {o1, …, on}  is polynomial time computable from any other shared BDD over A for <f1,…, fn> 33
  34. 34. Example: Shared BDD  Node 1 represents  Node 2 represents  Node 3 represents 34
  35. 35. BDD Data Structure  A BDD with n nodes is represented as a graph whose vertices are the natural numbers 1,…, n.  The adjacency relationship is described by an array of size n.  ith element = (low[i], high[i], label[i], value[i]) ◦ low[i] = low successor of i ◦ high[i] = high successor of i ◦ label[i] = label of i ◦ value[i] = used later to store the result of the BDD evaluation corresponding to i. 35
  36. 36. BDD Evaluation  The above algorithm computes the value of each node in under the assignment where  = = value of ith component 36
  37. 37. BDD Evaluation  Notice that we can compute the value of Boolean functions associated with each output node in one pass. 37
  38. 38. BDD Restrictions  The idea is to restrict the possible truth assignments such that external constraint f (a Boolean fn over A) evaluates to true under  Definition: f-restriction 38
  39. 39. Roadmap  Research problem at hand  Content-based Publish-Subscribe  Subscription Query Language  BDD Semantics  BDD Based matching  Experimental Results  Discussion (Pros and Cons) 39
  40. 40. Query BDDs  Key Idea ◦ Represent many subscription queries by a single shared BDD whose nodes correspond to atomic sub-formulas of the queries. ◦ Messages are matched against queries by simply running EvalBDD on the shared BDD. 40
  41. 41. Query BDDs  , a sequence of queries over the set of attributes V A= , the set of atomic sub-formulas of the queries.  is the set of propositional variables such that each atomic sub-formula a in A is assigned a propositional variable  = Boolean query obtained by substituting each a with 41
  42. 42. Example: Query BDDs  Let & two subscriptions received  Then, =  Three atomic sub-formulas => Three propositional variables 42
  43. 43. Example: Query BDDs  Let the variable order be SBDD corresponding to the queries 43
  44. 44. Query Matching: SiSL  Use EvalBDD algorithm for query matching  A query Qi is considered matched if the BDD node corresponding to Qi evaluates to 1.  Bottom-up evaluation makes sure sub- queries are evaluated only once. 44
  45. 45. Query Matching: DeSL  Same as handling complete messages  When a message received, it is extended to a total message before performing the matching. 45
  46. 46. Query Matching: StSL  Recall thata message m matches a subscription Q iff m is adequate for Q and m satisfies Q.  Can use a modified EvalBDD to perform faster matching  Key Ideas ◦ An undefined atom renders all sub- formulas in which it occurs undefined. ◦ Treat * as new value undefined 46
  47. 47. Query Matching: StSL  MVEvalBDD for StSL is significantly faster than EvalBDD for SiSL 47
  48. 48. Roadmap  Research problem at hand  Content-based Publish-Subscribe  Subscription Query Language  BDD Semantics  BDD Based matching  Experimental Results  Discussion (Pros and Cons) 48
  49. 49. # Nodes in SBDD vs. # Subscriptions  Number of nodes scale almost linearly ◦ High scalability  Restriction further reduces node count, minimizing memory requirements 49
  50. 50. Matching time for SiSL and StSL Time for StSL queries  Inputs: Number of subscription queries and message density (how total)  Partial messages can be matched quickly. 50
  51. 51. Roadmap  Research problem at hand  Content-based Publish-Subscribe  Subscription Query Language  BDD Semantics  BDD Based matching  Experimental Results  Discussion (Pros and Cons) 51
  52. 52. Variable Ordering vs. BDD size  Variable ordering has a tremendous influence on BDD size. 52
  53. 53. Pros  Introduces a well-formed semantics to describe the matching process in publish-subscribe systems  Matching as a satisfiability checking in SBDD allows to incrementally check multiple subscriptions  Scalable  StSL is more efficient than SiSL 53
  54. 54. Cons/Improvements  Does not describe any heuristics to select node ordering (NP-hard); ◦ Can we order based on the significance of the attributes involved?  Does not explore possibility of eliminating redundancies due to semantically related atomic sub-formulas (e.g.: price = 100 and price > 80) (again NP-hard) ◦ Can we further reduce the node count exploiting the semantics without causing side effect?  Efficiency of matching is not compared with existing systems 54
  55. 55. Conclusion  Two major contributions ◦ A Precise semantics to match messages to subscriptions ◦ Modeling filtering as a satisfiability check in BDD 55
  56. 56. Questions 56
  57. 57. Thank You 57

×