Slides prepared based on the paper Efficient Filtering in Publish-Subscribe Systems using BDD by Alexis Campailla, SagarChaki, Edmund Clarke, SomeshJha, Helmut Veith
What's New in Teams Calling, Meetings and Devices March 2024
Efficient Filtering in Pub-Sub Systems using BDD
1. Efficient Filtering in Publish-
Subscribe Systems using BDD
Alexis Campailla, Sagar Chaki, Edmund Clarke,
Somesh Jha, Helmut Veith
Prepared by Nabeel Mohamed
4/16/08
1
2. Outline
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
2
3. Research Problem at Hand
Loosely-coupled interactions in
publish-subscribe systems allows to
build very large scale systems
However, filtering techniques used are
a major bottleneck
Efficiency of the filtering technique
plays a major role in scalability
Whatever technique we use should be
provably correct
3
4. Major Contributions
A Precise semantics to match
messages (events) to subscriptions
(subscription queries)
Modeling filtering as a satisfiability
check in BDD
4
5. Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
5
7. Publish-Subscribe Systems
Publishers andSubscribers are
loosely coupled
◦ Space decoupled
◦ Time decoupled
◦ Synchronization decoupled
Content routers(brokers) form a
structured p2p system
Scalable Systems
7
8. Message (Event) Filtering
Filtering
◦ Matching incoming messages (events)
generated by Publishers with subscription
criteria
◦ A main task of content routers (brokers) –
filtering engine
Content-based pub-sub systems routes
messages (events) based on the content
itself
Example: Filter Quotes with symbol =
Google and offer price < 400 in a
Financial ticker.
8
9. Example Pub-Sub Systems
Stock market feeds
◦ For delivery of financial data such as
stock quotes, trade reports, news, etc. to
customers
◦ OPRA feed disseminates more than
100,000 quotes/sec
Sensor networks
Network traffic analysis
Transaction log analysis
9
10. Desirable Functions of a Filtering
Engine
Correctness:
◦ Correctly matching incoming messages with
subscription criteria
Expressiveness:
◦ Rich subscription language
Efficiency:
◦ Real time matching
Scalability:
◦ Handling a large number of subscriptions
Dynamic:
◦ Capability to add and remove subscriptions
online
10
11. Related Work
Most existing systems support only
conjunctive subscriptions
◦ GRYPHON
◦ SIENA
◦ Le Subscribe
Example: The following subscription
requires 27 GRYPHON-like
subscriptions while BDD handles it
naturally.
11
12. Related Work
Some systems have higher expressive
power at the expense of less efficient
filtering.
◦ ELVIN
Can we come up with an efficient
filtering technique while providing an
expressive subscription language?
BDD based filtering may be employed
in existing systems to improve
matching efficiency
12
13. Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
13
14. Subscription Query Language
The language used to describe
subscription criteria or subscriptions
Three Subscription Languages of
increasing complexity
◦ SiSL – Simple Subscription Language
◦ StSL – Strict Subscription Language
◦ DeSL – Default Subscription Language
14
15. Messages and Attributes
V = <v1, .., vn> = a finite sequence of
attributes
Each attribute vi has a type
Each attribute vi has a corresponding
domain
Event schema =
15
16. Messages and Attributes
A message = an assignment of values
to some (not necessarily all) of the
attributes
Formally, a message is a mapping m
such that for each attribute v, either
(m does not define v) ≡
A message is total if it defines all
attributes in V.
16
17. Messages and Attributes –
Example 1
Let V = <company, product, price>
over the event schema <STR, STR,
DBL>
Consider the following message:
<company> IBM </company>
<product>PC AT, 20 Mhz, 256 KB RAM</product>
<price>5000</price>
This describes a
total message m1
where m1(company) = “IBM”,
m1(product) = “PC AT, 20 Mhz, 256 KB
RAM” and m1(price) = 5000.
17
18. Messages and Attributes –
Example 2
Consider the following message:
<company> IBM </company>
<product>PC AT, 20 Mhz, 256 KB RAM</product>
This describes a different message m2
which is not total (i.e. partial), since
m2(price) = *.
18
19. Three Subscription Languages
SiSL – Simple Subscription Language
◦ All messages are total
StSL – Strict Subscription Language
◦ Messages define all attributes that occur in
the query (subscription criteria)
◦ SiSL is a subset of StSL
DeSL – Default Subscription Language
◦ All attributes are initialized to default values
(e.g. using NULL)
◦ Extends the functionality of SiSL to
heterogeneous message formats
19
20. Formalizing SiSL Queries
(Subscriptions)
Atomic formulas
Let vbe an attribute in V
If and
then the formulas v = c, v < c, c < v
are atomic formulas.
If , atomic formulas are
defined similarly.
If
then the formulas are
atomic formulas. ( ≡ substring)
20
21. Formalizing SiSL Queries
(Subscriptions)
Atoms = the set of atomic formulas
A Query is a Boolean combination
of atomic formulas
= the set of attributes occurring
in
= the set of atomic formulas
occurring in
21
23. Example: SiSL Query
The followingSiSL query matches all
messages for 1000 Mhz PCs
manufactured by IBM, Dell or Siemens
which cost at most $1000.
23
24. Formalizing SiSL Queries
(Subscriptions)
= The instantiation of a query by
a message m.
Definition:
is defined as the query obtained
from by replacing all variables
for which m(v) ≠ * by m(v).
Definition:
The SiSL query matches the total
message m if evaluates to true.
24
25. Formalizing StSL Queries
(Subscriptions)
StSL (Strict Subscription Language) is
generalization of SiSL.
Definition: adequacy
A message m is adequate for a query
, if for all , it holds that m(v)
≠ *.
Definition:
The query matches m, iff m is
adequate for and
25
26. Formalizing DeSL Queries
(Subscriptions)
DeSL (Default Subscription Language)
is the most general out of the three.
For each attribute vi, there’s a default
value
Definition:
The default extension of m is
defined as follows.
26
28. Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
28
29. BDDs (Binary Decision
Diagrams)
Notations
A = a set of propositional variables
= a linear ordering (variable
ordering) on A
= An ordered BDD over A, whose
non-terminal nodes are labeled by
variables in A, terminals by 0 or 1.
= The Boolean function
represented by node v in
29
30. Properties of BDDs
Each non-terminal node v has two out-
edges: low edge and high edge
Let a non-terminal node v with label ai
has successors at the low and high
edges u and w respectively. Then,
≡
Size = # nodes in the BDD
30
31. Example: BDD
The following BDD represents the
Boolean function x AND ( y OR z).
The variable ordering is
31
32. Shared BDDs (SBDDs)
While OBDDs represent one Boolean
function, SBDDs represent multiple
Boolean functions.
SBDD is a collection of component
OBDDs respecting same variable
ordering.
SBDD has a set of output nodes Vo =
{o1, …, on} each corresponding to
Boolean functions <f1,…, fn>
respectively.
32
33. SBDDs
Every root node of component
OBDDS Vo
Notation:
Denotes the BDD together with its
output nodes {o1, …, on}
is polynomial time
computable from any other shared
BDD over A for <f1,…, fn>
33
35. BDD Data Structure
A BDD with n nodes is represented as a
graph whose vertices are the natural
numbers 1,…, n.
The adjacency relationship is described
by an array of size n.
ith element = (low[i], high[i], label[i],
value[i])
◦ low[i] = low successor of i
◦ high[i] = high successor of i
◦ label[i] = label of i
◦ value[i] = used later to store the result of the
BDD evaluation corresponding to i.
35
36. BDD Evaluation
The above algorithm computes the
value of each node in under the
assignment where
= = value of ith component
36
37. BDD Evaluation
Notice that we
can compute the value
of Boolean functions associated with
each output node in one pass.
37
38. BDD Restrictions
The idea is to restrict the possible
truth assignments such that
external constraint f (a Boolean fn
over A) evaluates to true under
Definition: f-restriction
38
39. Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
39
40. Query BDDs
Key Idea
◦ Represent many subscription queries by a
single shared BDD whose nodes
correspond to atomic sub-formulas of the
queries.
◦ Messages are matched against queries
by simply running EvalBDD on the shared
BDD.
40
41. Query BDDs
, a sequence of queries
over the set of attributes V
A= , the set of atomic
sub-formulas of the queries.
is the set of propositional variables
such that each atomic sub-formula a
in A is assigned a propositional
variable
= Boolean query obtained by
substituting each a with 41
42. Example: Query BDDs
Let & two subscriptions received
Then, =
Three atomic sub-formulas => Three
propositional variables
42
43. Example: Query BDDs
Let the variable order be
SBDD corresponding
to the queries
43
44. Query Matching: SiSL
Use EvalBDD algorithm for query
matching
A query Qi is considered matched if
the BDD node corresponding to Qi
evaluates to 1.
Bottom-up evaluation makes sure sub-
queries are evaluated only once.
44
45. Query Matching: DeSL
Same as handling complete
messages
When a message received, it is
extended to a total message before
performing the matching.
45
46. Query Matching: StSL
Recall thata message m matches a
subscription Q iff m is adequate for Q
and m satisfies Q.
Can use a modified EvalBDD to
perform faster matching
Key Ideas
◦ An undefined atom renders all sub-
formulas in which it occurs undefined.
◦ Treat * as new value undefined
46
47. Query Matching: StSL
MVEvalBDD for StSL is significantly
faster than EvalBDD for SiSL
47
48. Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
48
49. # Nodes in SBDD vs. #
Subscriptions
Number of nodes scale almost linearly
◦ High scalability
Restriction further reduces node count,
minimizing memory requirements
49
50. Matching time for SiSL and StSL
Time for StSL queries
Inputs: Number of subscription queries
and message density (how total)
Partial messages can be matched
quickly.
50
51. Roadmap
Research problem at hand
Content-based Publish-Subscribe
Subscription Query Language
BDD Semantics
BDD Based matching
Experimental Results
Discussion (Pros and Cons)
51
52. Variable Ordering vs. BDD size
Variable ordering has
a tremendous
influence on BDD size.
52
53. Pros
Introduces a well-formed semantics to
describe the matching process in
publish-subscribe systems
Matching as a satisfiability checking in
SBDD allows to incrementally check
multiple subscriptions
Scalable
StSL is more efficient than SiSL
53
54. Cons/Improvements
Does not describe any heuristics to select
node ordering (NP-hard);
◦ Can we order based on the significance of the
attributes involved?
Does not explore possibility of eliminating
redundancies due to semantically related
atomic sub-formulas (e.g.: price = 100 and
price > 80) (again NP-hard)
◦ Can we further reduce the node count exploiting
the semantics without causing side effect?
Efficiency of matching is not compared with
existing systems
54
55. Conclusion
Two major contributions
◦ A Precise semantics to match messages
to subscriptions
◦ Modeling filtering as a satisfiability check
in BDD
55