Distributed Hash Table

2,906 views
2,560 views

Published on

Published in: Technology, Education
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,906
On SlideShare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
105
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Distributed Hash Table

  1. 1. Distributed Hash Table
  2. 2. Definition <ul><li>It is a class of decentralized distributed system that provide lookup service similar to hash table (key, value pair). </li></ul><ul><li>Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption </li></ul>
  3. 3. DHT- Structure <ul><li>Key space partitioning scheme splits keyspace among the participating nodes. </li></ul><ul><li>An overlay network that connects the nodes, allowing them to find of given key in the keyspace. </li></ul><ul><li>Hash algorithm (SHA-1). </li></ul><ul><li>Consistent hashing that provides removal or addition of one node changes only the set of keys owned by the nodes with adjacent IDs, and leaves all other nodes unaffected. </li></ul>
  4. 4. How Lookup works in DHT <ul><li>Leaf set is successors and predecessors </li></ul><ul><ul><li>All that’s needed for correctness </li></ul></ul><ul><li>Routing table matches successively longer prefixes </li></ul><ul><li>- All that’s needed for performance. </li></ul>Response Lookup ID Source
  5. 5. Chord <ul><li>One of the original distributed hash table developed at MIT. </li></ul><ul><li>Nodes are arranged in circle </li></ul><ul><li>IDs and key are assigned m-bit identifier using consistent hashing </li></ul>
  6. 6. Chord-properties <ul><li>Efficient directory operations </li></ul><ul><ul><li>Insertion, deletion, lookup </li></ul></ul><ul><li>Good analysis properties </li></ul><ul><ul><li>O(logN) routing table size </li></ul></ul><ul><ul><li>O(logN) logic steps to reach the successor of a key k </li></ul></ul><ul><li>High maintenance cost </li></ul><ul><ul><li>Node join/leave induces state change on other nodes </li></ul></ul>
  7. 7. Pastry <ul><li>Circular namespace </li></ul><ul><li>Routing Table: </li></ul><ul><ul><li>Peer p, ID: IDp </li></ul></ul><ul><ul><li>For each prefix of IDp, keep a set of peers who shares the prefix and the next digit is different from each other . </li></ul></ul><ul><li>Routing: </li></ul><ul><ul><li>Choose a peer whose ID shares the longest prefix with target ID </li></ul></ul><ul><ul><li>Choose a peer whose ID is numerically closest to target ID </li></ul></ul><ul><li>Similar analysis properties with Chord </li></ul>
  8. 8. The Problem of Churn <ul><li>The continuous process of node arrival and departure in DHTs. </li></ul><ul><li>One metric of churn is session time- time between when a node joins the network until the next time it leaves. Also consider lifetime and availability. </li></ul><ul><li>Even temporary loss of routing neighbor weakens the correctness and performance of DHTs. </li></ul><ul><li>Unavailability of neighbors reduce a node’s effective connectivity, forcing it to choose suboptimal routes and increasing the failures. </li></ul>
  9. 9. Experiment results <ul><li>Pastry fails to complete a majority of lookup requests under heavy churn because nodes waits so long on for request messages. </li></ul><ul><li>Chord performs well and return consistent results under lower churn rates but shortcoming is lookup latency. </li></ul><ul><li>Under churn DHT </li></ul><ul><ul><li>may fail to complete lookup request. </li></ul></ul><ul><ul><li>It will complete but return inconsistent results. </li></ul></ul><ul><ul><li>It will complete and return consistent results but suffer from dramatic increase in lookup latency. </li></ul></ul>
  10. 10. Handling Churn <ul><li>Factors that effect the behavior of DHTs under churn </li></ul><ul><ul><ul><li>Reactive versus periodic recovery from neighbor failures. </li></ul></ul></ul><ul><ul><ul><li>Calculation of good timeout values for lookup messages. </li></ul></ul></ul><ul><ul><ul><li>Techniques to achieve proximity in neighbor selection. </li></ul></ul></ul>
  11. 11. Reactive recovery <ul><li>A node reacts to loss of one of its existing leaf set neighbors by sending a copy of its leaf set to every node in it. </li></ul><ul><li>Total number of messages O(k^2) in </li></ul><ul><li>k-node network. </li></ul><ul><li>Without churn it is very efficient, as messages are sent only in response to actual changes. </li></ul><ul><li>It consumes more bandwidth as churn rates increases. </li></ul>
  12. 12. Periodic Recovery <ul><li>Node periodically shares its leaf set with each node in leaf set. </li></ul><ul><li>Share with one random node makes better improvement. </li></ul><ul><li>Number of messages exchanged are O(log k). </li></ul><ul><li>It aggregates all changes in each period into a single message. </li></ul><ul><li>Positive feedback cycles in reactive recovery. </li></ul>
  13. 13. Timeout Calculation <ul><li>A node should ensure that the timeout for a request was judiciously selected before routing to an alternative neighbor. </li></ul><ul><li>Techniques: </li></ul><ul><ul><li>TCP-style timeouts. (recursive routing) </li></ul></ul><ul><ul><li>RTO=AVG+4*VAR. </li></ul></ul><ul><ul><li>Timeouts from virtual coordinates. (Iterative routing) </li></ul></ul><ul><ul><li>uses distributed machine learning </li></ul></ul><ul><ul><li> D(N1,N2) proportional to network latency </li></ul></ul><ul><ul><li>Both strategies provide similar mean latency at low churn rates. </li></ul></ul>
  14. 14. Proximity Neighbor Selection <ul><li>The process of choosing among the potential neighbors for any given routing table entry according to their network latency to the choosing node. </li></ul><ul><li>Techniques: </li></ul><ul><li>Neighbors’ neighbors. </li></ul><ul><li>Neighbors’ inverse neighbors. </li></ul>
  15. 15. Proximity neighbor selection <ul><li>Algorithm: </li></ul><ul><li>- Use the algorithm to find nodes that may be near to the local node. </li></ul><ul><li>- Measure the latency to those nodes. </li></ul><ul><li>- If we have no existing neighbor in routing table fill it with new node or if node is already present replace it with the new node. </li></ul>

×