DHT

821 views

Published on

Presentations slides for DBISP2P Workshop 2003.

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
821
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
8
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

DHT

  1. 1. Building Content-Based Publish/Subscribe Systems with Distributed Hash Tables David Tam, Reza Azimi, Hans-Arno Jacobsen University of Toronto, Canada September 8, 2003
  2. 2. Introduction: Publish/Subscribe Systems Push model (a.k.a. event-notification) subscribe publish match Applications: stock-market, auction, eBay, news 2 types topic-based: ≈ Usenet newsgroup topics content-based: attribute-value pairs e.g. (attr1 = value1) ∧ (attr2 = value2) ∧ (attr3 > value3)
  3. 3. The Problem: Content-Based Publish/Subscribe Traditionally centralized scalability? More recently: distributed e.g. SIENA small set of brokers not P2P How about fully distributed? exploit P2P 1000s of brokers
  4. 4. Proposed Solution: Use Distributed Hash Tables DHTs hash buckets are mapped to P2P nodes Why DHTs? scalable, fault-tolerance, load-balancing Challenges distributed but co-ordinated and light-weight: subscribing publishing matching is difficult
  5. 5. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets distributed publish/subscribe system
  6. 6. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets home node subscriber subscription
  7. 7. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets home node subscription subscriber publisher publication
  8. 8. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets home node subscription publication subscriber publisher
  9. 9. Basic Scheme A matching publisher & subscriber must come up with the same hash keys based on the content Distributed Hash Table buckets home node subscription subscriber publisher publication
  10. 10. Naïve Approach Subscription Publication Attr1: value1 Attr1: value1 Attr2: value2 Attr2: Attr3: value3 Attr3: Key Hash Key Hash Attr4: value4 Attr4: value4 Function Function Attr5: value5 Attr5: Attr6: value6 Attr6: value6 Attr7: value7 Attr7: Publisher must produce keys for all possible attribute combinations: 2N keys for each publication Bottleneck at hash bucket node subscribing, publishing, matching
  11. 11. Our Approach Domain Schema N eliminates 2 problem similar to RDBMS schema set of attribute names set of value constraints set of indices create hash keys for indices only choose group of attributes that are common but combination of values rare well-known
  12. 12. Hash Key Composition Indices: {attr1}, {attr1, attr4}, {attr6, attr7} Publication Subscription Key1 Key1 Hash Hash Attr1: value1 Attr1: value1 Function Function Attr2: value2 Attr2: Attr3: value3 Attr3: Key2 Key2 Hash Hash Function Function Attr4: value4 Attr4: value4 Attr5: value5 Attr5: Attr6: value6 Attr6: value6 Key3 Hash Attr7: value7 Attr7: Function Possible false-positives because partial matching filtered by system Possible duplicate notifications because multiple subscription keys
  13. 13. Our Approach (cont’d) Multicast Trees eliminates bottleneck at hash bucket nodes distributed subscribing, publishing, matching Home node (hash bucket node) Existing subscribers New subscriber
  14. 14. Our Approach (cont’d) Multicast Trees eliminates bottleneck at hash bucket nodes distributed subscribing, publishing, matching Home node (hash bucket node) Existing subscribers Non-subscribers New subscriber
  15. 15. Our Approach (cont’d) Multicast Trees eliminates bottleneck at hash bucket nodes distributed subscribing, publishing, matching Home node (hash bucket node) Existing subscribers New subscribers
  16. 16. Handling Range Queries Hash function ruins locality 1 2 3 4 input hash( ) output 40 90 120 170 Divide range of values into intervals hash on interval labels e.g. RAM attribute 0 128 256 512 ∞ Label: A B C D For RAM > 384, submit hash keys: RAM = C RAM = D intervals can be sized according to probability distribution
  17. 17. Implementation & Evaluation Main Goal: scalability Metric: message traffic Built using: Pastry DHT Scribe multicast trees Workload Generator: uniformly random distributions
  18. 18. Event Scalability: 1000 nodes Need well-designed schema with low false-positives
  19. 19. Node Scalability: 40000 subs, pubs
  20. 20. Range Query Scalability: 1000 nodes Multicast tree benefits e.g. 1 range vs 0 range, at 40000 subs, pubs Expected 2.33 × msgs, but got 1.6 subscription costs decrease
  21. 21. Range Query Scalability: 40000 subs, pubs
  22. 22. Conclusion Method: DHT + domain schema Scales to 1000s of nodes Multicast trees are important Interesting point in design space some restrictions on expression of content must adhere to domain schema Future Work range query techniques examine multicast tree in detail locality-sensitive workload distributions real-world workloads detailed modelling of P2P network fault-tolerance

×