The authors implemented a hybrid Chord/Fusion system to provide 100% fault tolerance for distributed systems. They combined Chord's distributed lookup protocol with Fusion backups, where data is stored on a replicated Chord ring and fused backups. Testing showed the system was able to handle up to 100% node faults while still answering all lookups successfully. The authors proposed expanding the system to larger networks and keys and testing additional metrics like message time and path length to further evaluate the approach.
Chord P2P DHT Protocol with Fusion Backups for 100% Fault Tolerance
1. Chord P2P DHT Protocol
with Fusion Backups
Fault Tolerant Innovation
By: Chance Pianta, Adam Whipple, & Abe Arredondo
The University of Texas at Austin, Cockrell School of Engineering 12/03/2015
Term Project for EE 382N: Distributed Sys Opt III PRC, Fall 2015
Dr Vijay K. Garg Professor, Wei-Lun-Hung Teaching Assistant
2. Overview
Chord P2P DHT Protocol with Fusion Backups
1. What is Chord, Fusion,
2. P2P Types, and Related Research
3. Why and who uses P2P
4. Problem and Hypothesis
5. Results and Live Demonstration
6. Weaknesses and Ideas for Future Research
The Innovation
Combination of Chord, replicates, and Fusion technique to recover from an entire
Chord system failure
3. What is P2P, What is Chord [1]
Chord Ring is a distributed lookup P2P protocol
● No centralized control or hierarchical organization
● Operation: Given a key, it maps the key onto a node
● Determines the node responsible for storing the key’s value
● Each node maintains routing information for only about O(log N) other nodes
● Resolves all lookups via O(log N) messages to other nodes
● Updates to the routing information for nodes leaving and joining
● require only O(log N) messages
● Concurrent node arrivals and departures
● Continues to function when a node’s information is only partially correct
● Scales with number of nodes, recovers from simultaneous node failures and joins, answers lookups
● Identifiers are ordered in an Identifier Circle (or Chord Ring).
● Key k is assigned to first node whose identifier is equal to or follows k in the identifier space, called the “successor
node of key k” or successor(k).
4. What are Fusion Backups[3]
A method to recover from crash faults and byzantine faults in a set of
distributed state machines [4]
● Traditional Approach: Create state machine replicas, & coordinate primary state machine interaction/events replicas
● Fusion Approach use fused state machines to reduce overhead for normal operations & reduce msg storage space
● A Crash Fault is a fault in which a state machine loses its state, detectable, but lost state needs to be restored
● Communication channels must be reliable, FIFO, and with a fixed upper bound for delivery
● Process crashes must be able to be reliably detected
How is Fusion Used in this Study
An innovative Chord/Replicate/Fusion approach to recover from crash faults
100% Fault Tolerance; for medical life and death applications, or critical financial operations such as BitCoin
5. P2P Types
Unstructured
● Napster
● Gnutella
Structured (Other Related Work similar to Chord)
● Pastry: Scalable, decentralized object location, and routing for large-scale P2P
● PAST: Storage management & caching in for large-scale, persistent P2P storage utility
● Kademlia, Sub-second lookups on a large-scale Kademlia-based overlay in P2P
● PGrid: A self-organizing structured P2P system.
● CoopNet: A Social, P2P-Like Simulation Model to Explore Knowledge-Based Processing
6. Structured P2P Networks: Why & Who
Why: (1) redundant storage, (2) group communication, (3) global file sharing,
(4) user-created search engines, (5) virtual Supercomputers
Who & Types of Applications -- (1) Chat Tools, (2) Enterprise Instant Messaging,
(3) Private P2P Chat
7. Problem & Hypothesis
● Faults in Chord nodes can result in data
loss in the form of failed lookups.
● Hypothesis - Fusion implementation (Hybrid
Dynamo Design [3], where data is stored on
replicate chord ring and fused backups will
eliminate failed lookups.
8. Results
● Successfully implemented
Hybrid Dynamo Design [3] such
that up to 100% faults result in
zero failed lookups.
FAULTS FAILED LOOK UPS
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
10. Weaknesses & Ideas for Future Research
● The original Chord research contained exponentially larger Chord rings (104
Node
networks with 105
keys).
○ Implement the same Hybrid Dynamo Design with substantially larger node
networks and keys and then test f faults.
● Chord’s success is based on efficiency (in both memory usage and message complexity).
○ Implement a Fusion design without the secondary Replicate Chord where the Fused
backups supplement data from faulty nodes for any remaining nodes.
● This experiment tested failed lookups but did not consider other factors in the Chord
implementation such as message time, path length per node, etc.
○ Consider implementing tests for the above cases to fully understand the strengths
and weaknesses of the Hybrid Dynamo Implementation.
11. References
1. Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., Kaashoek, M., Dabek, F., & Balakrishnan, H. (2003). Chord: A
scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transactions on Networking(TON),11(1),
17-32. doi:10.1109/TNET.2002.808407 Chord:
2. A Scalable Peer-to-Peer Lookup Protocol for Internet Applications Presentation by: Andrew Cary & Adam Whipple
3. Balasubramanian, Bharath, and Vijay K. Garg. "A Fusion-based Approach for Handling Multiple Faults in Distributed Systems."
(2010).
4. Implementing Fault Tolerant Services Using State Machines: Beyond Replication. Presentation by: Walter
Scarborough and Maurice Roth-Miller