Chord presentation

  • 2,870 views
Uploaded on

 

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,870
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
121
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Transcript

  • 1. Chord: A Scalable Peer- to-peer Lookup Protocol for Internet Applications Ion Stoica, Robert Morris, David Liben-Nowell, David R.Karger, M. Frans Kaashoek, Frank Dabek Hari Balakrishnan
  • 2. About Chord• Protocol and algorithm for distributed hash table (DHT)• Structured-decentralized P2P overlay network
  • 3. About Chord• Supports just one operation: lookup(key) • Given a key, it maps the key onto a node • Returns the IP-address of the node• Associate a value with each key and store the pair on the resulting node: • Distributed hash table• Uses consistent hashing for this operation
  • 4. Chord properties (1)• Load balancing: • spreads keys evenly over nodes• Decentralized: • fully distributed, all nodes equally important • does not take into account heterogeneity of nodes
  • 5. Chord properties (2)• Scalability: • lookup cost: O(log(N))• Availability: • automatically updates lookup tables as nodes join, leave, fail• Flexible naming: • Flat key-space, no constraint on structure of keys
  • 6. Traditional hashing• Given an object, assign it to a bin from a set of N bins• Each bin: roughly same amount of objects• Problem: • If N changes, all objects need to be reassigned
  • 7. Consistent hashing• Evenly distributes K objects across N bins, about K/N each• When N changes: • Not all objects need to be moved • Only O(K/N) need to• Uses a base hash function, such as SHA-1
  • 8. Chord protocol• Assign each node and key an m-bit identifier using SHA-1: • node: hash the IP-address • key: hash the key• Identifiers are ordered on an identifier circle module 2m, the Chord ring: • circle of numbers from 0 to 2m-1
  • 9. Assigning keys to nodes • Assigning key k to a node: • first node whose identifier is equal to or follows the identifier of key k in the identifier space • successor node of k: successor(k) • first node clockwise from k on Chord ring
  • 10. Chord ring (m=6) N1 N56 N8 K10 K54 N51 N14 N48 N21 N42 K24 K38 N38 N32 K30
  • 11. Minimal disruption• Minimal disruption when node joins/leaves• n joins: • certain keys assigned to successor(n) are now assigned to n• n leaves: • all keys assigned to n are now assigned to successor(n)
  • 12. Simple Chord lookup• Simplest Chord lookup algorithm: • Each node only needs to know its successor on Chord ring • Query for identifier id is passed around Chord ring until successor(id) is found • Result returned along reverse path• Easy but not efficient: # messages O(N)
  • 13. Simple Chord lookup N1 lookup(K54)/ ask node n to fi nd the successor of id N8 K54 N56n.find successor(id) the successor // ask node n to find if// of id successor]) (id ∈ (n, n.find_successor(id) return successor; N51 N14 else return (n, successor]) if (id ∈ successor; N48 // forward the query around the circle else return successor.fiquery around the // forward the nd successor(id); // circle return successor.find_successor(id); N21 N42 (a) N38 N32 (b)seudocode to fi nd the successor node of an identifi er id. Remote procedure calls and variable lookups a query from node 8 for key 54, using the pseudocode in Figure 3(a).
  • 14. Finger table• Chord maintains additional routing info: • not essential for correctness • but allows acceleration of lookups• Each node maintains finger table with m entries: • n.finger[i] = successor(n + 2i-1)
  • 15. Finger tablele (but slow) pseudocode to fi nd the successor node of an identifi er id. Reme path taken by a query from node 8 for key 54, using the pseudocode in Figu N1 Finger table N8 +1 N8 + 1 N14 N8 + 2 N14 K54 N +2 N8 + 4 N14 +4 N8 + 8 N21 N51 N51 +32 +8 N14 N8 +16 N32 N8 +32 N42 N48 +16 N48 N21 N42 N N38 N32
  • 16. Improved lookup• Lookup for id now works as follows: • If id falls between n and successor(n), successor(n) is returned • Otherwise, lookup is performed at node n’, which is the node in the finger table of n that most immediately precedes id• Since each node has finger entries at power of two intervals around the identifier circle, each node can forward a query at least halfway along the remaining distance between the node and the target identifier.• Thus O(log(N)) nodes need to be contacted
  • 17. N32 (b) Chord protocol successor node of an identifi er id. Remote procedure calls and variable lookups arfor key 54, using the pseudocode in Figure 3(a).1 N1 Finger table lookup(54) // ask node n to find the successor N8 // of id +1 N8 + 1 N14 N8 n.find_successor(id) N8 + 2 N14 K54 N56 if (id ∈ (n, successor]) N14 N8 + 4 +2 return successor; + 8 N21 +4 N8 else N51 +8 N14 N8 +16 N32 N14 n′ = closest_preceding_node(id); N8 +32 N4216 return n′.find_successor(id); N48 // search the local table for the // highest predecessor of id n.closest_preceding_node(id) N21 N21 for i = m downto 1 if (finger[i] ∈ (n, id)) N42 return finger[i]; return n; N38 N32(a) (b)
  • 18. Coping with joining/leaving • Essential that successor pointers stay up to date • Each node periodically runs stabilization protocol which updates Chord’s successor pointers and finger tables • For node n to join, it must know a node n’ of the Chord ring
  • 19. Coping with joining/leaving • To join, n performs join() • node n asks n’ to find_successor(n), which n sets as its sucessor • Each node periodically performs stabilize() to learn about newly joined nodes • n asks his successor s for s’ predecessor p • decides whether p should be n’s successor instead
  • 20. Coping with joining/leaving• stabilize() also notifies n’s successor s of n’s existence • s might change it’s predecessor to n• Each node periodically calls fix_fingers() • makes sure fingers are up to date • is how new nodes acquire finger table • is how old nodes add new nodes to their tables
  • 21. Coping with joining/leaving // n′ thinks it might be our//join a Chord ring containing node n′. // predecessor.n.join(n′) n.notify(n′) predecessor = nil; if (predecessor is nil or successor = n′.find_successor(n); n′ ∈ (predecessor, n)) predecessor = n′;// called periodically. verifies n’s// immediate successor, and tells the// successor about n. // called periodically. refreshesn.stabilize() // finger table entries. next stores x = successor.predecessor; // the index of the next finger to fix. if (x ∈ (n, successor)) n.fix_fingers() successor = x; next = next + 1 ; successor.notify(n); if (next > m) next = 1; finger[next] = find_successor(n+2next−1);
  • 22. Coping with joining/leaving N21 N21 N21 N21 successor(N21) N26 N26 N26 K24 K24 N32 N32 N32 N32 K24 K24 K24 K30 K30 K30 K30 (a) (b) (c) (d) lustrating the join operation. Node 26 joins the system between nodes 21 and 32. The arcs represent the successor relationsh to node 32; (b) node 26 fi nds its successor (i.e., node 32) and points to it; (c) node 26 copies all keys less than 26 from nodeates the successor of node 21 to node 26.most immediate predecessor of id. In addition, its r successors means that it can inform the hig
  • 23. Coping with joining/leaving • Lookup will only fail if successor pointers are not up to date => retry after a pause • As soon as successor pointers are correct, lookup will work • over time, finger tables will be up to date and linear search is no longer needed • Old finger tables can still be used and in general lookup will remain O(log(N)) even if not all finger tables are up to date
  • 24. Successor list• Lookup fails if a node doesn’t know its correct successor • can occur when nodes fail• Solution: each node maintains a successor list: • contains the node’s first r successors • if first successor does not respond, try the rest of the list
  • 25. Replication• Successor list can also be used for replication • application on top of Chord might store replicas of data associated with a key on k nodes from n’s successor list • Chord can inform application when this successor list changes, so application can create new replicas
  • 26. Chord in practice• In practice, Chord ring will never be in stable state: • Joins and leaves occur continuously • Interleaved by stabilization algorithm • Not enough time to stabilize before new change• If stabilization is run at a certain rate: • Chord ring will be continuously “almost stable” and lookup will be fast and correct
  • 27. Virtual nodes• Chord load balancing can be improved through virtual nodes: • node ids do not uniformly cover entire id space • to make (# keys / node) more uniform, associate keys with virtual nodes • map r virtual nodes, with unrelated ids, to each real node• Tradeoff: each real node needs r times as much space to store finger table
  • 28. Network latency• Chord path length is O(log(N)), but latency can be quite large • Nodes close in id space can be far away in the underlying network• Idea: • choose next-hop finger based on progress in id space and latency in network • maximize progress in id space • minimize latency
  • 29. Network latency• Different approach to improving latency: • each entry in finger table can point to list of nodes instead of a single node • nodes have similar ids and are equivalent for routing purposes • latency of the nodes could be different • use closest node in terms of network distance in the underlying network
  • 30. Applications of Chord• Chord File System (CFS): • Chord locates storage blocks• Distributed indexes: • solves Napster’s central server issue• Large-scale combinatorial search: • candidate solutions are keys, Chord maps these to machines which test them• Cooperative mirroring: • Uses Chord to provide load balancing• Time-shared storage: • Uses Chord to provide availability
  • 31. ?