Distributed Hash Tables
Amir H. Payberah
amir@sics.se
Amirkabir University of Technology
(Tehran Polytechnic)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 1 / 62
What is the Problem?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 2 / 62
What is the Problem?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 3 / 62
Possible Solutions (1/3)
Central directory
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 4 / 62
Possible Solutions (2/3)
Flooding
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 5 / 62
Possible Solutions (3/3)
Distributed Hash Table (DHT)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 6 / 62
Distributed Hash Table
(DHT)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 7 / 62
Distributed Hash Table
An ordinary hash-table, which is ...
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 8 / 62
Distributed Hash Table
An ordinary hash-table, which is distributed.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 9 / 62
Steps to Build a DHT
Step 1: decide on common key space for nodes and values.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 10 / 62
Steps to Build a DHT
Step 1: decide on common key space for nodes and values.
Step 2: connect the nodes smartly.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 10 / 62
Steps to Build a DHT
Step 1: decide on common key space for nodes and values.
Step 2: connect the nodes smartly.
Step 3: make a strategy for assigning items to nodes.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 10 / 62
Chord: an Example of a DHT
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 11 / 62
Construct Chord - Step 1
Use a logical name space, called the id space, consisting of identifiers
{0, 1, 2, · · · , N − 1}.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 12 / 62
Construct Chord - Step 1
Use a logical name space, called the id space, consisting of identifiers
{0, 1, 2, · · · , N − 1}.
Id space is a logical ring modulo N.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 12 / 62
Construct Chord - Step 1
Use a logical name space, called the id space, consisting of identifiers
{0, 1, 2, · · · , N − 1}.
Id space is a logical ring modulo N.
Every node picks a random id though Hash H.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 12 / 62
Construct Chord - Step 1
Use a logical name space, called the id space, consisting of identifiers
{0, 1, 2, · · · , N − 1}.
Id space is a logical ring modulo N.
Every node picks a random id though Hash H.
Example:
• Space N = 16{0, · · · , 15}
• Five nodes a, b, c, d, e.
• H(a) = 6
• H(b) = 5
• H(c) = 0
• H(d) = 11
• H(e) = 2
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 12 / 62
Construct Chord - Step 2 (1/2)
The successor of an id is the first node met going in clockwise
direction starting at the id.
succ(x): is the first node on the ring with id greater than or equal
x.
• succ(12) = 0
• succ(1) = 2
• succ(6) = 6
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 13 / 62
Construct Chord - Step 2 (2/2)
Each node points to its successor.
The successor of a node n is succ(n + 1).
• 0’s successor is succ(1) = 2.
• 2’s successor is succ(3) = 5.
• 11’s successor is succ(12) = 0.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 14 / 62
Construct Chord - Step 3
Where to store data?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 15 / 62
Construct Chord - Step 3
Where to store data?
Use globally known hash function H.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 15 / 62
Construct Chord - Step 3
Where to store data?
Use globally known hash function H.
Each item key, value gets identifier H(key) = k.
• Space N = 16{0, · · · , 15}
• Five nodes a, b, c, d, e.
• H(Fatemeh) = 12
• H(Cosmin) = 2
• H(Seif) = 9
• H(Sarunas) = 14
• H(Tallat) = 4
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 15 / 62
Construct Chord - Step 3
Where to store data?
Use globally known hash function H.
Each item key, value gets identifier H(key) = k.
• Space N = 16{0, · · · , 15}
• Five nodes a, b, c, d, e.
• H(Fatemeh) = 12
• H(Cosmin) = 2
• H(Seif) = 9
• H(Sarunas) = 14
• H(Tallat) = 4
Store each item at its successor.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 16 / 62
Construct Chord - Step 3
Where to store data?
Use globally known hash function H.
Each item key, value gets identifier H(key) = k.
• Space N = 16{0, · · · , 15}
• Five nodes a, b, c, d, e.
• H(Fatemeh) = 12
• H(Cosmin) = 2
• H(Seif) = 9
• H(Sarunas) = 14
• H(Tallat) = 4
Store each item at its successor.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 16 / 62
Construct Chord - Step 3
Where to store data?
Use globally known hash function H.
Each item key, value gets identifier H(key) = k.
• Space N = 16{0, · · · , 15}
• Five nodes a, b, c, d, e.
• H(Fatemeh) = 12
• H(Cosmin) = 2
• H(Seif) = 9
• H(Sarunas) = 14
• H(Tallat) = 4
Store each item at its successor.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 17 / 62
How to Lookup?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 18 / 62
Lookup (1/2)
To lookup a key k:
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 19 / 62
Lookup (1/2)
To lookup a key k:
• Calculate H(k).
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 19 / 62
Lookup (1/2)
To lookup a key k:
• Calculate H(k).
• Follow succ pointers until item k is found.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 19 / 62
Lookup (1/2)
To lookup a key k:
• Calculate H(k).
• Follow succ pointers until item k is found.
Example:
• Lookup Seif at node 2.
• H(Seif) = 9
• Traverse nodes: 2, 5, 6, 11
• Return Stockholm to initiator
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 19 / 62
Lookup (2/2)
Algorithm 1 Ask node n to find the successor of id
1: procedure n.findSuccessor(id)
2: if pred = Ø and id ∈ (pred, n] then
3: return n
4: else if id ∈ (n, succ] then
5: return succ
6: else// forward the query around the circle
7: return succ.findSuccessor(id)
8: end if
9: end procedure
(a, b] the segment of the ring moving clockwise from but not includ-
ing a until and including b.
n.foo(.) denotes an RPC of foo(.) to node n.
n.bar denotes and RPC to fetch the value of the variable bar in
node n.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 20 / 62
Put and Get
Algorithm 2 Store value with key id in the DHT
1: procedure n.put(id, value)
2: n = findSuccessor(id)
3: s.store(id, value)
4: end procedure
Algorithm 3 Retrieve the value of the key id from the DHT
1: procedure n.get(id)
2: n = findSuccessor(id)
3: return s.retrieve(id)
4: end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 21 / 62
Any Improvement?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 22 / 62
Improvement
Speeding up lookups.
If only the successor pointers are used:
• Worst case lookup time is N, for N nodes.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 23 / 62
Speeding up Lookups (1/2)
Finger/routing table:
• Point to succ(n + 1)
• Point to succ(n + 2)
• Point to succ(n + 4)
• ...
• Point to succ(n + 2M−1
) (N = 2M
, N: the id space size)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 24 / 62
Speeding up Lookups (1/2)
Finger/routing table:
• Point to succ(n + 1)
• Point to succ(n + 2)
• Point to succ(n + 4)
• ...
• Point to succ(n + 2M−1
) (N = 2M
, N: the id space size)
Distance always halved to the
destination.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 24 / 62
Speeding up Lookups (2/2)
Every node n knows succ(n + 2i−1) for i = 1, · · · , M.
Size of routing tables is logarithmic:
• Routing table size: M, where N = 2M
.
• Routing entries = log2(N).
• Example: Log2(1000000) ≈ 20
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 25 / 62
Lookup Improvement (1/3)
Algorithm 4 Ask node n to find the successor of id
1: procedure n.findSuccessor(id)
2: if pred = Ø and id ∈ (pred, n] then
3: return n
4: else if id ∈ (n, succ] then
5: return succ
6: else// forward the query around the circle
7: return succ.findSuccessor(id)
8: end if
9: end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 26 / 62
Lookup Improvement (2/3)
Algorithm 5 Ask node n to find the successor of id
1: procedure n.findSuccessor(id)
2: if pred = Ø and id ∈ (pred, n] then
3: return n
4: else if id ∈ (n, succ] then
5: return succ
6: else// forward the query around the circle
7: p ← closestPrecedingNode(id)
8: return p.findSuccessor(id)
9: end if
10: end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 27 / 62
Lookup Improvement (3/3)
Algorithm 6 Search locally for the highest predecessor of id
1: procedure closestPrecedingNode(id)
2: for i = m downto 1 do
3: if finget[i] ∈ (n, id) then
4: return finger[i]
5: end if
6: end for
7: end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 28 / 62
Lookups (1/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 29 / 62
Lookups (2/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 30 / 62
Lookups (3/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 31 / 62
Lookups (4/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 32 / 62
Lookups (5/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 33 / 62
Lookups (6/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 34 / 62
Lookups (7/7)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 35 / 62
How to Maintain the Ring?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 36 / 62
Periodic Stabilization (1/2)
In Chord, in addition to the successor pointer, every node has a
predecessor pointer.
• Predecessor of node n is the first node met in anti-clockwise
direction starting at n.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 37 / 62
Periodic Stabilization (1/2)
In Chord, in addition to the successor pointer, every node has a
predecessor pointer.
• Predecessor of node n is the first node met in anti-clockwise
direction starting at n.
Periodic stabilization is used to make pointers eventually correct.
• Pointing succ to closest alive
successor.
• Pointing pred to closest alive
predecessor.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 37 / 62
Periodic Stabilization (2/2)
Algorithm 7 Periodically at n
1: procedure n.stabilize()
2: v ← succ.pred
3: if v = Ø and v ∈ (n, succ] then
4: succ ← v
5: end if
6: send notify(n) to succ
7: end procedure
Algorithm 8 Upon receipt a notify(p) at node m
1: on receive notify | p from n do
2: if pred = Ø or p ∈ (pred, m] then
3: pred ← p
4: end if
5: end event
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 38 / 62
Handling Join
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 39 / 62
Handling Join
When n joins:
• Find n’s successor with lookup(n).
• Set succ to n’s successor.
• Stabilization fixes the rest.
Algorithm 9 Join a Chord ring containing node m
1: procedure n.join(m)
2: pred ← Ø
3: succ ← m.findSuccessor(n)
4: end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 40 / 62
Join (1/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 41 / 62
Join (2/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 42 / 62
Join (3/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 43 / 62
Join (4/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 44 / 62
Join (5/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 45 / 62
Fix Fingers (1/4)
Periodically refresh finger table entries, and store the index of the
next finger to fix.
Algorithm 10 When receiving notify(p) at n
1: procedure n.fixFingers()
2: next ← next + 1
3: if next > m then
4: next ← 1
5: end if
6: finger[next] ← findSuccessor(n ⊕ 2next−1)
7: end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 46 / 62
Fix Fingers (2/4)
Current situation: succ(N48) = N60
succ(21 ⊕ 25) = succ(53) = N60
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 47 / 62
Fix Fingers (3/4)
succ(21 ⊕ 25) = succ(53) =?
New node N56 joins and stabilizes successor pointer.
Finger 6 of node N21 is wrong now.
N21 eventually try to fix finger 6 by looking up 53 which stops at
N48.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 48 / 62
Fix Fingers (4/4)
succ(21 ⊕ 25) = succ(53) = N56
N48 will eventually stabilize its successor.
This means the ring is correct now.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 49 / 62
Handling Failure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 50 / 62
Successor List (1/2)
A node has a successors list of size r containing the immediate r
successors.
• succ(n + 1)
• succ(succ(n + 1) + 1)
• succ(succ(succ(n + 1) + 1) + 1)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 51 / 62
Successor List (1/2)
A node has a successors list of size r containing the immediate r
successors.
• succ(n + 1)
• succ(succ(n + 1) + 1)
• succ(succ(succ(n + 1) + 1) + 1)
How big should r be?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 51 / 62
Successor List (1/2)
A node has a successors list of size r containing the immediate r
successors.
• succ(n + 1)
• succ(succ(n + 1) + 1)
• succ(succ(succ(n + 1) + 1) + 1)
How big should r be? log2(N)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 51 / 62
Successor List (2/2)
Algorithm 11 Join a Chord ring containing node m
1: procedure n.join(m)
2: pred ← Ø
3: succ ← m.findSuccessor(n)
4: updateSuccesorList(succ.successorList)
5: end procedure
Algorithm 12 Periodically at n
1: procedure n.stabilize()
2: succ ← find first alive node in successor list
3: v ← succ.pred
4: if v = Ø and v ∈ (n, succ] then
5: succ ← v
6: end if
7: send notify(n) to succ updateSuccessorList(succ.successorList)
8: end procedure
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 52 / 62
Dealing with Failure
Periodic stabilization
If successor fails: replace with closest alive successor
If predecessor fails: set predecessor to nil
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 53 / 62
Failure (1/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 54 / 62
Failure (2/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 55 / 62
Failure (3/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 56 / 62
Failure (4/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 57 / 62
Failure (5/5)
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 58 / 62
Summary
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 59 / 62
Summary
DHTs: distributed key, value
Lookup service
Put and Get
Finger list: improve the lookup
Periodically stabilization
Successor list
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 60 / 62
References:
Ion Stoica et al., Chord: A scalable peer-to-peer lookup service for
internet applications, SIGCOMM, 2001.
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 61 / 62
Questions?
Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 62 / 62

Distributed Hash Table

  • 1.
    Distributed Hash Tables AmirH. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 1 / 62
  • 2.
    What is theProblem? Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 2 / 62
  • 3.
    What is theProblem? Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 3 / 62
  • 4.
    Possible Solutions (1/3) Centraldirectory Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 4 / 62
  • 5.
    Possible Solutions (2/3) Flooding AmirH. Payberah (Tehran Polytechnic) DHTs 1393/7/12 5 / 62
  • 6.
    Possible Solutions (3/3) DistributedHash Table (DHT) Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 6 / 62
  • 7.
    Distributed Hash Table (DHT) AmirH. Payberah (Tehran Polytechnic) DHTs 1393/7/12 7 / 62
  • 8.
    Distributed Hash Table Anordinary hash-table, which is ... Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 8 / 62
  • 9.
    Distributed Hash Table Anordinary hash-table, which is distributed. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 9 / 62
  • 10.
    Steps to Builda DHT Step 1: decide on common key space for nodes and values. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 10 / 62
  • 11.
    Steps to Builda DHT Step 1: decide on common key space for nodes and values. Step 2: connect the nodes smartly. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 10 / 62
  • 12.
    Steps to Builda DHT Step 1: decide on common key space for nodes and values. Step 2: connect the nodes smartly. Step 3: make a strategy for assigning items to nodes. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 10 / 62
  • 13.
    Chord: an Exampleof a DHT Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 11 / 62
  • 14.
    Construct Chord -Step 1 Use a logical name space, called the id space, consisting of identifiers {0, 1, 2, · · · , N − 1}. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 12 / 62
  • 15.
    Construct Chord -Step 1 Use a logical name space, called the id space, consisting of identifiers {0, 1, 2, · · · , N − 1}. Id space is a logical ring modulo N. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 12 / 62
  • 16.
    Construct Chord -Step 1 Use a logical name space, called the id space, consisting of identifiers {0, 1, 2, · · · , N − 1}. Id space is a logical ring modulo N. Every node picks a random id though Hash H. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 12 / 62
  • 17.
    Construct Chord -Step 1 Use a logical name space, called the id space, consisting of identifiers {0, 1, 2, · · · , N − 1}. Id space is a logical ring modulo N. Every node picks a random id though Hash H. Example: • Space N = 16{0, · · · , 15} • Five nodes a, b, c, d, e. • H(a) = 6 • H(b) = 5 • H(c) = 0 • H(d) = 11 • H(e) = 2 Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 12 / 62
  • 18.
    Construct Chord -Step 2 (1/2) The successor of an id is the first node met going in clockwise direction starting at the id. succ(x): is the first node on the ring with id greater than or equal x. • succ(12) = 0 • succ(1) = 2 • succ(6) = 6 Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 13 / 62
  • 19.
    Construct Chord -Step 2 (2/2) Each node points to its successor. The successor of a node n is succ(n + 1). • 0’s successor is succ(1) = 2. • 2’s successor is succ(3) = 5. • 11’s successor is succ(12) = 0. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 14 / 62
  • 20.
    Construct Chord -Step 3 Where to store data? Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 15 / 62
  • 21.
    Construct Chord -Step 3 Where to store data? Use globally known hash function H. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 15 / 62
  • 22.
    Construct Chord -Step 3 Where to store data? Use globally known hash function H. Each item key, value gets identifier H(key) = k. • Space N = 16{0, · · · , 15} • Five nodes a, b, c, d, e. • H(Fatemeh) = 12 • H(Cosmin) = 2 • H(Seif) = 9 • H(Sarunas) = 14 • H(Tallat) = 4 Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 15 / 62
  • 23.
    Construct Chord -Step 3 Where to store data? Use globally known hash function H. Each item key, value gets identifier H(key) = k. • Space N = 16{0, · · · , 15} • Five nodes a, b, c, d, e. • H(Fatemeh) = 12 • H(Cosmin) = 2 • H(Seif) = 9 • H(Sarunas) = 14 • H(Tallat) = 4 Store each item at its successor. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 16 / 62
  • 24.
    Construct Chord -Step 3 Where to store data? Use globally known hash function H. Each item key, value gets identifier H(key) = k. • Space N = 16{0, · · · , 15} • Five nodes a, b, c, d, e. • H(Fatemeh) = 12 • H(Cosmin) = 2 • H(Seif) = 9 • H(Sarunas) = 14 • H(Tallat) = 4 Store each item at its successor. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 16 / 62
  • 25.
    Construct Chord -Step 3 Where to store data? Use globally known hash function H. Each item key, value gets identifier H(key) = k. • Space N = 16{0, · · · , 15} • Five nodes a, b, c, d, e. • H(Fatemeh) = 12 • H(Cosmin) = 2 • H(Seif) = 9 • H(Sarunas) = 14 • H(Tallat) = 4 Store each item at its successor. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 17 / 62
  • 26.
    How to Lookup? AmirH. Payberah (Tehran Polytechnic) DHTs 1393/7/12 18 / 62
  • 27.
    Lookup (1/2) To lookupa key k: Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 19 / 62
  • 28.
    Lookup (1/2) To lookupa key k: • Calculate H(k). Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 19 / 62
  • 29.
    Lookup (1/2) To lookupa key k: • Calculate H(k). • Follow succ pointers until item k is found. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 19 / 62
  • 30.
    Lookup (1/2) To lookupa key k: • Calculate H(k). • Follow succ pointers until item k is found. Example: • Lookup Seif at node 2. • H(Seif) = 9 • Traverse nodes: 2, 5, 6, 11 • Return Stockholm to initiator Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 19 / 62
  • 31.
    Lookup (2/2) Algorithm 1Ask node n to find the successor of id 1: procedure n.findSuccessor(id) 2: if pred = Ø and id ∈ (pred, n] then 3: return n 4: else if id ∈ (n, succ] then 5: return succ 6: else// forward the query around the circle 7: return succ.findSuccessor(id) 8: end if 9: end procedure (a, b] the segment of the ring moving clockwise from but not includ- ing a until and including b. n.foo(.) denotes an RPC of foo(.) to node n. n.bar denotes and RPC to fetch the value of the variable bar in node n. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 20 / 62
  • 32.
    Put and Get Algorithm2 Store value with key id in the DHT 1: procedure n.put(id, value) 2: n = findSuccessor(id) 3: s.store(id, value) 4: end procedure Algorithm 3 Retrieve the value of the key id from the DHT 1: procedure n.get(id) 2: n = findSuccessor(id) 3: return s.retrieve(id) 4: end procedure Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 21 / 62
  • 33.
    Any Improvement? Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 22 / 62
  • 34.
    Improvement Speeding up lookups. Ifonly the successor pointers are used: • Worst case lookup time is N, for N nodes. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 23 / 62
  • 35.
    Speeding up Lookups(1/2) Finger/routing table: • Point to succ(n + 1) • Point to succ(n + 2) • Point to succ(n + 4) • ... • Point to succ(n + 2M−1 ) (N = 2M , N: the id space size) Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 24 / 62
  • 36.
    Speeding up Lookups(1/2) Finger/routing table: • Point to succ(n + 1) • Point to succ(n + 2) • Point to succ(n + 4) • ... • Point to succ(n + 2M−1 ) (N = 2M , N: the id space size) Distance always halved to the destination. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 24 / 62
  • 37.
    Speeding up Lookups(2/2) Every node n knows succ(n + 2i−1) for i = 1, · · · , M. Size of routing tables is logarithmic: • Routing table size: M, where N = 2M . • Routing entries = log2(N). • Example: Log2(1000000) ≈ 20 Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 25 / 62
  • 38.
    Lookup Improvement (1/3) Algorithm4 Ask node n to find the successor of id 1: procedure n.findSuccessor(id) 2: if pred = Ø and id ∈ (pred, n] then 3: return n 4: else if id ∈ (n, succ] then 5: return succ 6: else// forward the query around the circle 7: return succ.findSuccessor(id) 8: end if 9: end procedure Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 26 / 62
  • 39.
    Lookup Improvement (2/3) Algorithm5 Ask node n to find the successor of id 1: procedure n.findSuccessor(id) 2: if pred = Ø and id ∈ (pred, n] then 3: return n 4: else if id ∈ (n, succ] then 5: return succ 6: else// forward the query around the circle 7: p ← closestPrecedingNode(id) 8: return p.findSuccessor(id) 9: end if 10: end procedure Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 27 / 62
  • 40.
    Lookup Improvement (3/3) Algorithm6 Search locally for the highest predecessor of id 1: procedure closestPrecedingNode(id) 2: for i = m downto 1 do 3: if finget[i] ∈ (n, id) then 4: return finger[i] 5: end if 6: end for 7: end procedure Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 28 / 62
  • 41.
    Lookups (1/7) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 29 / 62
  • 42.
    Lookups (2/7) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 30 / 62
  • 43.
    Lookups (3/7) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 31 / 62
  • 44.
    Lookups (4/7) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 32 / 62
  • 45.
    Lookups (5/7) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 33 / 62
  • 46.
    Lookups (6/7) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 34 / 62
  • 47.
    Lookups (7/7) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 35 / 62
  • 48.
    How to Maintainthe Ring? Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 36 / 62
  • 49.
    Periodic Stabilization (1/2) InChord, in addition to the successor pointer, every node has a predecessor pointer. • Predecessor of node n is the first node met in anti-clockwise direction starting at n. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 37 / 62
  • 50.
    Periodic Stabilization (1/2) InChord, in addition to the successor pointer, every node has a predecessor pointer. • Predecessor of node n is the first node met in anti-clockwise direction starting at n. Periodic stabilization is used to make pointers eventually correct. • Pointing succ to closest alive successor. • Pointing pred to closest alive predecessor. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 37 / 62
  • 51.
    Periodic Stabilization (2/2) Algorithm7 Periodically at n 1: procedure n.stabilize() 2: v ← succ.pred 3: if v = Ø and v ∈ (n, succ] then 4: succ ← v 5: end if 6: send notify(n) to succ 7: end procedure Algorithm 8 Upon receipt a notify(p) at node m 1: on receive notify | p from n do 2: if pred = Ø or p ∈ (pred, m] then 3: pred ← p 4: end if 5: end event Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 38 / 62
  • 52.
    Handling Join Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 39 / 62
  • 53.
    Handling Join When njoins: • Find n’s successor with lookup(n). • Set succ to n’s successor. • Stabilization fixes the rest. Algorithm 9 Join a Chord ring containing node m 1: procedure n.join(m) 2: pred ← Ø 3: succ ← m.findSuccessor(n) 4: end procedure Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 40 / 62
  • 54.
    Join (1/5) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 41 / 62
  • 55.
    Join (2/5) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 42 / 62
  • 56.
    Join (3/5) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 43 / 62
  • 57.
    Join (4/5) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 44 / 62
  • 58.
    Join (5/5) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 45 / 62
  • 59.
    Fix Fingers (1/4) Periodicallyrefresh finger table entries, and store the index of the next finger to fix. Algorithm 10 When receiving notify(p) at n 1: procedure n.fixFingers() 2: next ← next + 1 3: if next > m then 4: next ← 1 5: end if 6: finger[next] ← findSuccessor(n ⊕ 2next−1) 7: end procedure Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 46 / 62
  • 60.
    Fix Fingers (2/4) Currentsituation: succ(N48) = N60 succ(21 ⊕ 25) = succ(53) = N60 Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 47 / 62
  • 61.
    Fix Fingers (3/4) succ(21⊕ 25) = succ(53) =? New node N56 joins and stabilizes successor pointer. Finger 6 of node N21 is wrong now. N21 eventually try to fix finger 6 by looking up 53 which stops at N48. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 48 / 62
  • 62.
    Fix Fingers (4/4) succ(21⊕ 25) = succ(53) = N56 N48 will eventually stabilize its successor. This means the ring is correct now. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 49 / 62
  • 63.
    Handling Failure Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 50 / 62
  • 64.
    Successor List (1/2) Anode has a successors list of size r containing the immediate r successors. • succ(n + 1) • succ(succ(n + 1) + 1) • succ(succ(succ(n + 1) + 1) + 1) Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 51 / 62
  • 65.
    Successor List (1/2) Anode has a successors list of size r containing the immediate r successors. • succ(n + 1) • succ(succ(n + 1) + 1) • succ(succ(succ(n + 1) + 1) + 1) How big should r be? Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 51 / 62
  • 66.
    Successor List (1/2) Anode has a successors list of size r containing the immediate r successors. • succ(n + 1) • succ(succ(n + 1) + 1) • succ(succ(succ(n + 1) + 1) + 1) How big should r be? log2(N) Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 51 / 62
  • 67.
    Successor List (2/2) Algorithm11 Join a Chord ring containing node m 1: procedure n.join(m) 2: pred ← Ø 3: succ ← m.findSuccessor(n) 4: updateSuccesorList(succ.successorList) 5: end procedure Algorithm 12 Periodically at n 1: procedure n.stabilize() 2: succ ← find first alive node in successor list 3: v ← succ.pred 4: if v = Ø and v ∈ (n, succ] then 5: succ ← v 6: end if 7: send notify(n) to succ updateSuccessorList(succ.successorList) 8: end procedure Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 52 / 62
  • 68.
    Dealing with Failure Periodicstabilization If successor fails: replace with closest alive successor If predecessor fails: set predecessor to nil Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 53 / 62
  • 69.
    Failure (1/5) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 54 / 62
  • 70.
    Failure (2/5) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 55 / 62
  • 71.
    Failure (3/5) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 56 / 62
  • 72.
    Failure (4/5) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 57 / 62
  • 73.
    Failure (5/5) Amir H.Payberah (Tehran Polytechnic) DHTs 1393/7/12 58 / 62
  • 74.
    Summary Amir H. Payberah(Tehran Polytechnic) DHTs 1393/7/12 59 / 62
  • 75.
    Summary DHTs: distributed key,value Lookup service Put and Get Finger list: improve the lookup Periodically stabilization Successor list Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 60 / 62
  • 76.
    References: Ion Stoica etal., Chord: A scalable peer-to-peer lookup service for internet applications, SIGCOMM, 2001. Amir H. Payberah (Tehran Polytechnic) DHTs 1393/7/12 61 / 62
  • 77.
    Questions? Amir H. Payberah(Tehran Polytechnic) DHTs 1393/7/12 62 / 62