Building Content-Based
Publish/Subscribe Systems
with Distributed Hash Tables



 David Tam, Reza Azimi, Hans-Arno Jacobse...
Introduction:
Publish/Subscribe Systems

  Push model (a.k.a. event-notification)
      subscribe     publish      match
 ...
The Problem:
Content-Based Publish/Subscribe

  Traditionally centralized
      scalability?
  More recently: distributed
...
Proposed Solution:
Use Distributed Hash Tables
  DHTs
      hash buckets are mapped to P2P nodes
  Why DHTs?
      scalabl...
Basic Scheme
   A matching publisher & subscriber must come up with
   the same hash keys based on the content

          ...
Basic Scheme
   A matching publisher & subscriber must come up with
   the same hash keys based on the content

          ...
Basic Scheme
   A matching publisher & subscriber must come up with
   the same hash keys based on the content

          ...
Basic Scheme
   A matching publisher & subscriber must come up with
   the same hash keys based on the content

          ...
Basic Scheme
   A matching publisher & subscriber must come up with
   the same hash keys based on the content

          ...
Naïve Approach
                                                      Subscription
 Publication
Attr1: value1              ...
Our Approach
  Domain Schema
                  N
     eliminates 2 problem
     similar to RDBMS schema
     set of attrib...
Hash Key Composition
    Indices: {attr1}, {attr1, attr4}, {attr6, attr7}
 Publication                                    ...
Our Approach (cont’d)
  Multicast Trees
      eliminates bottleneck at hash bucket nodes
      distributed subscribing, pu...
Our Approach (cont’d)
  Multicast Trees
      eliminates bottleneck at hash bucket nodes
      distributed subscribing, pu...
Our Approach (cont’d)
  Multicast Trees
      eliminates bottleneck at hash bucket nodes
      distributed subscribing, pu...
Handling Range Queries
  Hash function ruins locality
                    1             2              3         4
  input...
Implementation & Evaluation
  Main Goal: scalability
  Metric: message traffic
  Built using:
      Pastry DHT
      Scrib...
Event Scalability: 1000 nodes




     Need well-designed schema with low false-positives
Node Scalability: 40000 subs, pubs
Range Query Scalability: 1000 nodes




  Multicast tree benefits
       e.g. 1 range vs 0 range, at 40000 subs, pubs
    ...
Range Query Scalability: 40000 subs, pubs
Conclusion
  Method: DHT + domain schema
  Scales to 1000s of nodes
  Multicast trees are important
  Interesting point in...
Upcoming SlideShare
Loading in...5
×

DHT

658

Published on

Presentations slides for DBISP2P Workshop 2003.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
658
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "DHT"

  1. 1. Building Content-Based Publish/Subscribe Systems with Distributed Hash Tables David Tam, Reza Azimi, Hans-Arno Jacobsen University of Toronto, Canada September 8, 2003
  2. 2. Introduction: Publish/Subscribe Systems Push model (a.k.a. event-notification) subscribe publish match Applications: stock-market, auction, eBay, news 2 types topic-based: ≈ Usenet newsgroup topics content-based: attribute-value pairs e.g. (attr1 = value1) ∧ (attr2 = value2) ∧ (attr3 > value3)
  3. 3. The Problem: Content-Based Publish/Subscribe Traditionally centralized scalability? More recently: distributed e.g. SIENA small set of brokers not P2P How about fully distributed? exploit P2P 1000s of brokers
  4. 4. Proposed Solution: Use Distributed Hash Tables DHTs hash buckets are mapped to P2P nodes Why DHTs? scalable, fault-tolerance, load-balancing Challenges distributed but co-ordinated and light-weight: subscribing publishing matching is difficult
  5. 5. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets distributed publish/subscribe system
  6. 6. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets home node subscriber subscription
  7. 7. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets home node subscription subscriber publisher publication
  8. 8. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets home node subscription publication subscriber publisher
  9. 9. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets home node subscription subscriber publisher publication
  10. 10. Naïve Approach Subscription Publication Attr1: value1 Attr1: value1 Attr2: value2 Attr2: Attr3: value3 Attr3: Key Hash Key Hash Attr4: value4 Attr4: value4 Function Function Attr5: value5 Attr5: Attr6: value6 Attr6: value6 Attr7: value7 Attr7: Publisher must produce keys for all possible attribute combinations: 2N keys for each publication Bottleneck at hash bucket node subscribing, publishing, matching
  11. 11. Our Approach Domain Schema N eliminates 2 problem similar to RDBMS schema set of attribute names set of value constraints set of indices create hash keys for indices only choose group of attributes that are common but combination of values rare well-known
  12. 12. Hash Key Composition Indices: {attr1}, {attr1, attr4}, {attr6, attr7} Publication Subscription Key1 Key1 Hash Hash Attr1: value1 Attr1: value1 Function Function Attr2: value2 Attr2: Attr3: value3 Attr3: Key2 Key2 Hash Hash Function Function Attr4: value4 Attr4: value4 Attr5: value5 Attr5: Attr6: value6 Attr6: value6 Key3 Hash Attr7: value7 Attr7: Function Possible false-positives because partial matching filtered by system Possible duplicate notifications because multiple subscription keys
  13. 13. Our Approach (cont’d) Multicast Trees eliminates bottleneck at hash bucket nodes distributed subscribing, publishing, matching Home node (hash bucket node) Existing subscribers New subscriber
  14. 14. Our Approach (cont’d) Multicast Trees eliminates bottleneck at hash bucket nodes distributed subscribing, publishing, matching Home node (hash bucket node) Existing subscribers Non-subscribers New subscriber
  15. 15. Our Approach (cont’d) Multicast Trees eliminates bottleneck at hash bucket nodes distributed subscribing, publishing, matching Home node (hash bucket node) Existing subscribers New subscribers
  16. 16. Handling Range Queries Hash function ruins locality 1 2 3 4 input hash( ) output 40 90 120 170 Divide range of values into intervals hash on interval labels e.g. RAM attribute 0 128 256 512 ∞ Label: A B C D For RAM > 384, submit hash keys: RAM = C RAM = D intervals can be sized according to probability distribution
  17. 17. Implementation & Evaluation Main Goal: scalability Metric: message traffic Built using: Pastry DHT Scribe multicast trees Workload Generator: uniformly random distributions
  18. 18. Event Scalability: 1000 nodes Need well-designed schema with low false-positives
  19. 19. Node Scalability: 40000 subs, pubs
  20. 20. Range Query Scalability: 1000 nodes Multicast tree benefits e.g. 1 range vs 0 range, at 40000 subs, pubs Expected 2.33 × msgs, but got 1.6 subscription costs decrease
  21. 21. Range Query Scalability: 40000 subs, pubs
  22. 22. Conclusion Method: DHT + domain schema Scales to 1000s of nodes Multicast trees are important Interesting point in design space some restrictions on expression of content must adhere to domain schema Future Work range query techniques examine multicast tree in detail locality-sensitive workload distributions real-world workloads detailed modelling of P2P network fault-tolerance
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×