IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011 419 Replica Placement for Route Diversity in Tree-Based Routing Distributed Hash Tables Cyrus Harvesf and Douglas M. Blough, Senior Member, IEEE Abstract—Distributed hash tables (DHTs) share storage and routing responsibility among all nodes in a peer-to-peer network. These networks have bounded path length unlike unstructured networks. Unfortunately, nodes can deny access to keys or misroute lookups. We address both of these problems through replica placement. We characterize tree-based routing DHTs and define MAXDISJOINT, a replica placement that creates route diversity for these DHTs. We prove that this placement creates disjoint routes and find the replication degree necessary to produce a desired number of disjoint routes. Using simulations of Pastry (a tree-based routing DHT), we evaluate the impact of MAXDISJOINT on routing robustness compared to other placements when nodes are compromised at random or in a contiguous run. Furthermore, we consider another route diversity mechanism that we call neighbor set routing and show that, when used with our replica placement, it can successfully route messages to a correct replica even with a quarter of the nodes in the system compromised at random. Finally, we demonstrate a family of replica query strategies that can trade off response time and system load. We present a hybrid query strategy that keeps response time low without producing too high a load. Index Terms—Distributed systems, peer-to-peer networks, distributed hash tables, routing, replica placement, robustness. Ç1 INTRODUCTIONP EER-TO-PEER (p2p) networks are a popular substrate for building distributed applications because of their effi-ciency, scalability, resilience to failure, and ability to self- reported, for example, in p2p file sharing systems. DHTs are inherently less resilient to these attacks than unstruc- tured networks because unstructured networks typicallyorganize. The p2p architecture relies on the distribution of broadcast messages, which is a more robust (and muchresponsibility among hundreds of thousands, if not millions, more expensive) mechanism.of nodes in the network. Therefore, if a small set of nodes fail Sit and Morris  classify attacks on DHTs into threeto serve data objects, properly maintain routing information, categories:or route messages, the integrity of a very large-scale systemmay be compromised. 1. storage and retrieval attacks, which target the The efficiency of lookups has become a central focus of manner in which peers manage data items;p2p design because many popular applications, like name 2. routing attacks, which target the manner in which peers route messages; andresolution, publish-subscribe, and IP communication, rely 3. miscellaneous attacks, which target other aspects ofon a lookup service as a core functionality. A p2p distributed the system, such as admission control or the under-hash table (DHT) may be used to provide this functionality. lying network routing service.DHTs , , ,  structure the network topology in away that enables routing algorithms to produce lookup The first class of attack is commonly addressed withpaths of bounded length (typically Oðlog NÞ). replication. Objects are replicated at several peers in the Unfortunately, when deployed over the Internet, DHTs network to increase the likelihood that there will be amay be impacted by the failure or compromise of peers in correct replica available. The benefits of replication on loadthe overlay and performance guarantees no longer hold. In balancing and overall performance have also been studied.fact, it may not be possible to fetch a desired object at all. To our knowledge, ours is the first work that considers howMany p2p networks allow nodes to join without prejudice, the placement of replicas affects object reachability throughleaving the network vulnerable to attack. Furthermore, the the routing infrastructure.network could face coordinated attacks from competitors or Numerous works , ,  have relied on routeother groups that have an interest in the failure of the diversity to mitigate the effects of routing attacks. Srivatsanetwork. This type of coordinated attack behavior has been and Liu  introduced the notion of independent lookup paths to improve routing robustness. Two paths are said to be independent if they share no hops other than the source. C. Harvesf is with Microsoft Corporation, 1 Microsoft Way, Redmond, WA and destination peers. It is worth noting that route diversity 98052. E-mail: email@example.com.. D.M. Blough is with the Department of Electrical and Computer has benefits in addition to improving routing robustness. Engineering, Georgia Institute of Technology, KACB, Room 3356, Atlanta, For example, diverse routes can be used to improve load GA 30332-0765. E-mail: firstname.lastname@example.org. balance and fairness or to circumnavigate congested areasManuscript received 14 Oct. 2008; revised 24 June 2009; accepted 22 Sept. of the network.2009; published online 4 Dec. 2009. Our work realizes the benefits of replication and routeRecommended for acceptance by A. Schiper.For information on obtaining reprints of this article, please send e-mail to: diversity in concert through replica placement. In email@example.com, and reference IEEECS Log Number TDSC-2008-10-0158. paper, we consider a class of DHTs that route messagesDigital Object Identifier no. 10.1109/TDSC.2009.49. using a scheme which we call tree-based routing. We show 1545-5971/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society
420 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011that there exists a replica placement, which we call equivalence classes, each of size d. If an object with its ID in aMAXDISJOINT, that creates disjoint routes in DHTs of this particular equivalence class is replicated, replicas are placedtype. We prescribe the number and placement of replicas at all IDs in the class. This is a very general definition, whichnecessary to produce d disjoint routes from any source is completely independent of routing. Thus, for a particularnode to the replica set. With this scheme, we are able to DHT structure, some such schemes could produce a largetolerate d À 1 malicious peers, whether they are attacking number of disjoint routes while others might produce verythe storage and retrieval of data items or the routing few. To realize benefits of this approach for routing security,infrastructure. Our approach is targeted specifically at it is therefore necessary to instantiate particular schemes forDHT-based (structured) p2p systems with a multilevel different DHTs or classes of DHTs and evaluate their routingrouting structure. So-called “one-hop” DHTs ,  are characteristics. This is exactly the problem considered in thisdiscussed in Section 6. paper. The instantiation of symmetric replication that was In order for disjoint routes to improve the robustness of presented in  was equally spaced replication. The paperthe system, the client must be able to verify the integrity of contained a thorough evaluation, which showed that thedata items. This is necessary for the client to detect when a technique reduces the message overhead in node joins andmalicious peer has tampered with the result of a data item leaves, provides better load balance, and improves faultlookup. Therefore, we assume that data in the system are tolerance. However, routing robustness was not considered.self-certifying. This assumption is quite common in peer-to- It is worth noting that the MAXDISJOINT placement ispeer systems , ,  and is discussed in more detail in equivalent to equally spaced replication in Chord. In otherSection 6. DHT implementations, however, MAXDISJOINT provides Using Pastry as an example, we evaluate MAXDISJOINT added flexibility in terms of tuning routing robustness thatthrough simulation and show that a DHT with typical equally spaced replication does not provide.configuration parameters can benefit from our replica Our prior work is the first that considers the impact ofplacement. Our experiments show that with only eight replica placement on routing robustness in DHTs. Our initialreplicas and a quarter of nodes compromised at random, a work focused on the benefits of equally spaced placement innode can find a route, which consists of only uncompro- Chord  and has expanded into the MAXDISJOINTmised nodes, to a correct replica with greater than 97 percent placement , a general solution that gives the benefits ofprobability. MAXDISJOINT also tolerates runs of compro- equally spaced placement in Chord to all DHTs that employmised nodes; with 16 replicas and 85 percent of the DHT a tree-based routing scheme.compromised in a run, lookups can be resolved with greaterthan 96 percent probability. Furthermore, we use a technique 2.2 Peer-to-Peer Routing Securitywhich we call neighbor set routing to increase route diversity A number of works that look to improve routing securityand improve the probability of lookup success. For example, are centered around the notion of route diversity. Fora lookup performed with neighbor set routing and MAXDIS- example, Artigas et al. propose Cyclone, an equivalence-JOINT placement can be resolved successfully with greaterthan 97 percent probability with 40 percent of nodes based routing scheme deployed over an existing structuredcompromised at random. Finally, we demonstrate that the peer-to-peer overlay . Independent lookup paths arestrategy used to query replicas can have a significant impact created by routing across different equivalence classes.on performance and we propose a hybrid query strategy that Since the paths are independent and do not differ in thecan be used to trade off response time and system load for the destination, Cyclone does not naturally mitigate the effectsbest performance. of storage and retrieval attacks. Furthermore, each peer is required to maintain additional routing information, which incurs overhead. In contrast, our replica placement creates2 RELATED WORK disjoint routes without requiring any additional routingTo place our work in context, we discuss related work on state or modifying the underlying routing scheme.replica placement, peer-to-peer routing security, and gen- Portmann et al. use route diversity to provide messageeral peer-to-peer security issues. confidentiality . Messages are split in two, encrypted, and sent to the destination across diverse paths. Route2.1 Replica Placement diversity is created by routing messages through theReplica placement has long been studied in the realm of routing table entries that minimize route overlap. Thedistributed computing. Many studies have compared the appropriate entries are chosen using empirical results thatperformance of different placement schemes in terms of depend on the network size. They show that, in the bestquality of service, availability, and time to recovery in case, this method results in an average path overlap ofdifferent types of serverless systems , , , . The 15-25 percent between a pair of routes. In other words, atfirst DHT-based replication schemes were only concerned best, 15 percent of the routes will be common to bothwith availability and thus local replication, i.e., replicas paths. We show analytically that our placement createsplaced close to the master copy in the ID space, was used multiple nonoverlapping routes., . As detailed herein, such placements have very Castro et al. combine secure node identifier assignment,little routing robustness. secure routing table maintenance, and secure message A very important paper, which proposed the first forwarding to create a secure routing primitive . Securedeterministic nonlocal replica placement scheme for node identifier assignment ensures that an adversary cannotDHT-based systems, was that of Ghodsi et al. . This take arbitrary identifiers. It also ensures a uniform distribu-paper discussed a set of symmetric replica placement schemes tion of compromised nodes. Secure routing table mainte-that, for a replication degree of d, divide the ID space into nance ensures that the average fraction of compromised
HARVESF AND BLOUGH: REPLICA PLACEMENT FOR ROUTE DIVERSITY IN TREE-BASED ROUTING DISTRIBUTED HASH TABLES 421routing table entries does not exceed the fraction of (Plaxton ), or some hybrid (Pastry , Tapestry ).compromised nodes in the network. The geometry impacts neighbor and route selection, which Secure message forwarding guarantees that a message can have an impact on flexibility, resilience, and proximitysent to a key is delivered to all of its replicas with high performance as studied in . Although we use the termprobability. This component most closely resembles our “tree-based routing,” we are not referring to the geometry,work. It relies on route diversity to deliver messages to the but the routing algorithm. Tree-based routing algorithmsneighborhood of destination. Unlike our approach, this have specific properties that we define herein.iterative redundant routing scheme requires modification tothe routing infrastructure of the DHT. One such modification 3.1 Tree-Based Routing DHTsis the forwarding of messages through the neighbors of the Consider a DHT with an ordered id space I with size N ¼source node, which we call neighbor set routing. We will show jIj and a branching factor B such that logB N is integral. Thethat neighbor set routing is useful in creating route diversity branching factor is used by each node to construct itsand can improve on the benefit of our replica placement. routing table. The routing table of a node u has the Mickens and Noble develop a framework for diagnosing following properties:broken overlay routes, whether they result from IP-levellink failures or malicious peers in the overlay . Once the 1. The node u partitions the entire id space intocause of the broken route is detected, the IP-level link is contiguous segments and selects one node fromcircumnavigated or the malicious peer is excluded from the each id segment to include in its routing table. Thesystem. Rather than excluding peers that may have been partitioning is performed as follows:falsely diagnosed as malicious, we use route diversity to a. The node u partitions the id space into B equalavoid faulty nodes. size contiguous parts. It is worth noting that path disjointedness has been b. Of the B id segments, u selects the id segment I 0considered in other contexts as well. Castro et al. propose a of which it is a member.p2p multicast system for high bandwidth content (e.g.,streaming video) . Content is split into “stripes” such that c. Steps 1a and 1b are repeated to repartition I 0 untilthe quality of the content improves with the number of parts of size one are created. The part that consistsstripes downloaded simultaneously. The stripes are deliv- of the node u is discarded (there is no need for u toered to subscribers via multicast trees. To ensure that the maintain a routing table entry to itself).failure of a single node does not compromise all stripes, the d. At the end of the partitioning process, u willtrees are constructed to be inner node disjoint. In other have created ðB À 1Þ logB N contiguous parts ofwords, no node will be an inner node of more than one the id space, with sizes N ; B2 ; . . . ; BðlogNNÞÀ1 ; and 1, B N Bmulticast tree. Therefore, if a single node fails, no more than and with B À 1 parts of each size.one stripe is lost as a result. Inner node disjoint trees are 2. For each part P , u selects a node v 2 P that covers Pconstructed by selecting roots that vary in terms of the node and places it in its routing table. A node is said toprefixes. MAXDISJOINT creates disjoint paths in much the cover a part P if its routing table contains k > 1 entriessame way in Pastry. that cover the nonempty parts P1 ; P2 ; . . . ; Pk and P ¼ P1 [ P2 [ Á Á Á [ Pk . By definition, a node u is said2.3 General Issues in Peer-to-Peer Network Security to cover the part consisting of its id. This definition,Consistent node identity is critical to key placement and combined with partitioning of P into nonempty parts,routing. Most structured networks place a key at the node ensures that the recursive definition of coveragewith identifier “closest” to the key identifier. If nodes do not terminates.maintain a single, consistent identity, key placement can be The partitioning and routing table construction is showncompromised. Furthermore, an adversary that can take graphically in Fig. 1.multiple identities has the ability to partition the network Note that tree-based routing DHT implementations, . Secure admission control protocols are necessary to differ in the manner in which Property 2 is satisfied. Forlimit the number of identities an entity can obtain . example, in Chord , since routing is performed in theOther works have implemented node identities in a clockwise direction, the most counterclockwise node in eachcensorship-resistant, anonymous fashion , . part must be selected because it is the only node that covers Nodes must also be able to store keys fairly in a manner the entire id segment. In prefix-matching routing DHTs, allthat allows for verification. A number of works have aimed of the nodes within a part share a common prefix and coverto create self-certifying data. CFS  uses the cryptographic the entire id segment; therefore, any node within the parthash of a data item as its key. PAST  relies on may be chosen as a routing table entry. This explains thecryptographic signatures to verify data integrity. An flexibility in choosing routing table entries in DHTs likealternative is to use replication and Byzantine-fault-tolerant Pastry  and Tapestry .algorithms  to maintain data consistency and correctness. Routing is performed by forwarding the messageSelf-certifying data are discussed in more detail in Section 6. destined for the id d to the entry that covers the id segment that contains d. We define a DHT constructed in this manner to be a tree-based routing DHT. If the paths from any3 DISTRIBUTED HASH TABLES WITH TREE-BASED source node to all possible destinations are aggregated, the ROUTING resulting topology is a tree, which is how tree-based routingDistributed hash tables are often referenced by their gets its name. Note that many popular DHT implementa-geometry, i.e., ring (Chord ), torus (CAN ), tree tions , , , ,  exhibit these properties and,
422 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011 Proof. Since the id segment covered by each hop must be a subset of the id segment covered by the previous hop, the minimum ratio between the size of id segments covered by consecutive hops is the branching factor B. Therefore, the longest path repeatedly divides N by B with each hop until it reaches the destination. This requires logB N hops. u t 3.2 Creating Disjoint Routes with Tree-Based Routing Determinism and routing convergence provide a natural avenue for creating disjoint routes. Routing convergence guarantees that once a path enters a segment of the id space, it will never proceed to a node that is outside of that segment. If two paths can be created originating in different segments, then the paths are guaranteed to be disjoint. Furthermore,Fig. 1. Pastry routing table structure for the hypothetical node 121 the determinism property ensures that any two routing table(N ¼ 64; B ¼ 4). The space is partitioned into ðB À 1Þ logB N ¼ 9 entries will route to different segments. Therefore, we cansegments. The highlighted region marks the segment to which node create disjoint routes simply by routing through different121 belongs. routing table entries. This is stated formally in the following lemma.therefore, employ a tree-based routing scheme. These DHTshave a number of useful properties, which we prove below. Lemma 4. In a full tree-based routing DHT, routes originating at First, tree-based routing DHTs are deterministic; that is, a common source node with different first hops are disjoint.given a message destined for a node d, each node has one Proof. Suppose two routes originating at a common sourceand only one routing table entry through which the node n have first hops e1 and e2 , such that e1 and e2 aremessage can be forwarded to d. different routing table entries of n. Using the determin-Lemma 1 (Determinism). For the routing table of any node in a ism property, if e1 and e2 cover id segments I1 and I2 , tree-based routing DHT, if the entries e1 and e2 cover id respectively, then I1 I2 ¼ ;. segments I1 and I2 , respectively, then I1 I2 ¼ ;. Consider any hops h1 and h2 in the routes beginningProof. This follows naturally from the partitioning of the with first hops e1 and e2 , respectively. Suppose hops h1 0 0 id space. u t and h2 cover the id segments I1 and I2 , respectively. 0 Using the routing convergence property, I1 & I1 and 0 0 0 Routes in tree-based routing DHTs are guaranteed to I2 & I2 . Since I1 I2 ¼ ;, I1 I2 ¼ ; and, therefore, theconverge. This property holds when the DHT is full1; that is, routes are disjoint. u tevery possible id is represented by a node. The source node can use any of its routing table entriesLemma 2 (Routing Convergence). Consider the route in a full as the first hop in a route; therefore, we can state the tree-based routing DHT from a source node s to a destination d number of disjoint routes that can be created from any represented as a series of nodes n1 ¼ s; n2 ; . . . ; nkÀ1 ; nk ¼ d source node by counting routing table entries. such that ni is some entry ej from the routing table of node niÀ1 for i > 1. Suppose that n1 ; n2 ; n3 ; . . . ; nk cover the id Lemma 5. In a full distributed hash table that employs tree-based segments I1 ¼ I; I2 ; I3 ; . . . ; Ik .2 Then, routing, there are at most d disjoint routes from any source node, where d ¼ ðB À 1Þ logB N. Ik ¼ fdg & IkÀ1 & IkÀ2 & Á Á Á & I2 & I1 : Proof. Employing Lemma 4, we can create a disjoint route forProof. Consider the hop from nj to njþ1 . Since the node nj each of the d routing table entries by routing to a covers the id segment Ij , it must partition Ij into B equal destination in the segment covered by each entry. We sized id segments. Since njþ1 covers Ijþ1 , which is one of cannot create d þ 1 or more disjoint routes because two or the parts of Ij , then Ijþ1 & Ij . Furthermore, since the more routes would share the first hop and overlap. u t DHT is full, the node nkÀ1 has a part that contains only the destination d. u t The proof for Lemma 5 alludes to choosing multiple Furthermore, it is possible to bound the number of hops destinations to create disjoint routes. This lends itselfin every route; we state this formally below. naturally to a replica placement. In the following section, we propose a replica placement that creates disjoint routes,Lemma 3 (Bounded Path Length). Any path in a full tree- which we call MAXDISJOINT. based routing DHT has at most logB N hops. 1. Some tree-based routing DHT implementations have to provide an 4 THE MAXDISJOINT REPLICA PLACEMENTadditional mechanism to ensure routing convergence when the DHT is notfull. Pastry, for example, maintains the neighborhood and leaf sets for this Using Pastry as an example in the following sections, wepurpose. 2. Any node can cover the entire id space; therefore, we can state that n1 will demonstrate how the properties of tree-based routingcovers I. can be used to construct a replica placement that creates
HARVESF AND BLOUGH: REPLICA PLACEMENT FOR ROUTE DIVERSITY IN TREE-BASED ROUTING DISTRIBUTED HASH TABLES 423disjoint routes. We begin with an example placement toprovide the reader with some intuition and then movetoward a more formal definition. After defining theplacement, which we call MAXDISJOINT, we will evaluatethe necessary replication degree to create a desired numberof disjoint routes. Then, we will introduce the notion of arun and provide an expression for the maximum tolerablerun length for a given replication degree. Next, we willdiscuss why MAXDISJOINT is a more adaptive and flexiblesolution than equally spaced replication. Finally, weoutline the basic elements of an implementation of theMAXDISJOINT placement.4.1 Intuition behind MaxDisjoint PlacementTo create route diversity in Pastry via replica placement, itis necessary to place replicas such that a given replica set Fig. 2. MAXDISJOINT placement for object 101 to create five disjoint routes from query node 121 (N ¼ 64; B ¼ 4).will use a diverse set of routing table entries for everypossible source node. We use an example to provide the target this routing table entry: 011, 111, 211, and 311. One ofnecessary intuition. Consider a Pastry ring with id space of these four replicas will create an additional route for everysize 64 and prefix matching in base-4 digits. We show the possible source node depending on its prefix. The remain-Pastry routing table structure graphically for a hypothetical ing three will be routed through previously used routingnode 121 in Fig. 1. table entries overlapping a previous route. This is shown Suppose we wish to replicate an object with id 101 in this graphically in Fig. 2. Five disjoint routes are created forPastry ring. Node 121 routes to this object through the node 121, one each for the replicas R001 (or R011), R101,routing table entry marked “10x” in Fig. 1. Suppose we R111, R201 (or R211), and R301 (or R311). In a similarreplicate the object with the id 111 to target the routing table fashion, we can create a sixth disjoint route using theentry “11x” in the example. This approach creates anadditional disjoint route for any lookups for object 101 replicas 021, 121, 221, and 321; and a seventh using 031, 131,originating at node 121. One route is forwarded via the 231, and 331. This pattern continues until the entire id space isentry “10x” and the other is forwarded via “11x.” However, exhausted. Note that in Pastry each node partitions theconsider another source node 221. This node routes to theobject 101 and 111 through the same entry marked “1xx” id space using prefixes and, therefore, we place replicas byand, therefore, does not gain an additional disjoint route. varying their prefixes. However, in general, we are simply To move toward a more effective approach, consider all spacing replicas such that replicas exist in different parts ofthe replicas of object 101 that would create an additional each node’s partitioning of the id space. In the next section,disjoint route for node 121. These are: 001, 111, 120, 122, 123, we provide an algorithm that generates these replica ids.131, 201, and 301. Note that there are total of nine possible 4.2 Definition of MaxDisjoint Placementdisjoint routes3 (including the route to the object 101), which MAXDISJOINT assigns each replica an identifier, which isis the number of routing table entries for node 121. Of these used to determine its placement. The placement algorithmreplicas, there are only three that can create an additional takes as input N, the identifier space size of the DHT; B, thedisjoint route for every possible source node: 001, 201, and branching factor; and d, the desired number of disjoint301. These replicas create disjoint routes by targeting entries routes. We will prove that MAXDISJOINT creates the desiredin the first row of the routing table. Note that targeting an number of disjoint routes in a later section.entry in the first row of the routing table requires a singlereplica whose id differs from that of the master object in the Algorithm 1 (MAXDISJOINT Replica Placement). To createfirst digit. To target entries deeper in the routing table, a d disjoint routes, replicas are placed in m þ 1 rounds, where dÀ1larger number of replicas are required. m ¼ bBÀ1c. Each round consists of B À 1 steps except for Suppose we wish to create five disjoint routes for all the final round, which consists of n steps, wherepossible source nodes. Four routes can be created for every n ¼ ðd À 1Þ mod ðB À 1Þ. In the ith round, BiÀ1 replicaspossible source node using the three replicas we have are placed at equally spaced locations over the entirealready discussed (001, 201, and 301) in addition to the identifier space at each step. In step j of round i, theobject 101. To create the fifth route, we must target an entry replica locations are given by:deeper in the routing table. In the case of node 121, we may Ri;j ¼ fki;j ; ki;j þ si ; ki;j þ 2si ; . . .choose the replica 111. As alluded to before, this replicaonly creates a disjoint route for those source nodes whose . . . ki;j þ ðBiÀ1 À 1Þsi g ðmod NÞ;ids start with the prefix “1” because these are the only N where ki;j ¼ k þ j Bi .nodes with an entry for “11x.” Since there are four possiblevalues for the prefix (B ¼ 4), four replicas are required to Looking at the example in Fig. 2, consider the placement 3. There is actually a tenth “zero-hop” path that can be created by of an object 101 in a DHT with B ¼ 4 and N ¼ 64. Supposeplacing a replica at 121. we want to create d ¼ 5 disjoint routes. Then, we perform
424 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011m þ 1 ¼ 2 rounds of the replica placement algorithm and As a corollary, we give the necessary replication degreen ¼ 1 step in the final round. The replica locations and to create d disjoint routes.the corresponding rounds and steps of the algorithm are Corollary 1. To produce d ðB À 1Þ logB N disjoint routes fromgiven below: any query node to a key k in a full tree-based routing DHT Round 1; Step 1 : 201 with branching factor B > 1, the key k must be replicated at Round 1; Step 2 : 301 ðn þ 1ÞBm locations determined by the MAXDISJOINT dÀ1 Round 1; Step 3 : 001 Algorithm, where m ¼ bBÀ1c and n ¼ ðd À 1Þ mod ðB À 1Þ. Round 2; Step 1 : 111; 211; 311; 011 Proof. Since we have proved that d disjoint routes are indeed created by the algorithm, we need to show that As described in the replica placement algorithm, in each performing the algorithm with input d places a copy of kround replicas are placed starting at the master key and at ðn þ 1ÞBm locations in the id space. The key k accountsworking in the direction of increasing identifiers. The for one location and the remaining locations arealgorithm is presented as such for its simplicity. However, determined by the replica placement. Since, BiÀ1 replicasthe steps within each round can be performed in any order. are placed at each step in round i, the total number ofEach step is functionally equivalent to the others in its replicas is given by:round. Therefore, a real implementation may reorder thesteps in each round to distribute the replicas more uniformly ! ! replicas in replicas inacross the identifier space. This will help to provide load ðkeyÞ þ þbalance and tolerate runs of contiguous failed nodes. For first m rounds last n stepsinstance, in the example above, to create two disjoint routes, X mwe need only the master object (101) and one of the replicas ¼1þ ðB À 1ÞBiÀ1 þ nBmcreated in round 1 (001, 201, or 301). Any of these replicas i¼1would create the second disjoint route, but choosing replica ¼ ðn þ 1ÞBm :301 creates a more uniformly distributed replica set. t u4.3 Evaluation of Disjoint Routes We give our replica placement the name MAXDISJOINTThe desired number of disjoint routes d is one of the tunable because Corollary 1 prescribes the minimum replicationinputs to the MAXDISJOINT algorithm. Controlling the level degree to create the desired number of disjoint routesof fault tolerance is an important design parameter and, using this placement. We state this formally in thetherefore, d is a very useful input. In this section, we will following theorem:formally prove that the algorithm indeed creates d disjointroutes in support of the intuition provided in earlier Theorem 2. To produce d disjoint routes from any query node tosections. a key k in a tree-based routing DHT with B > 1, the key k In our analysis, we assume that routing is performed in must be replicated at no fewer than ðn þ 1ÞBm locationsan identifier space of size N with branching factor B. All of determined by the MAXDISJOINT Algorithm, where m ¼ dÀ1our analytical results are proved within the context of a full bBÀ1c and n ¼ ðd À 1Þ mod ðB À 1Þ.DHT, but we will show, through experimentation, that Proof. Assume d disjoint routes can be created with ðn þ 1Þthese properties hold even in sparsely populated DHTs. Bm À 1 locations determined by the MAXDISJOINT Our principal goal is to prove the following theorem: dÀ1 Algorithm, where m ¼ bBÀ1c and n ¼ ðd À 1Þ modTheorem 1. The MAXDISJOINT Algorithm produces d ðB À 1Þ. In other words, d disjoint routes are created ðB À 1Þ logB N disjoint routes from any query node to a with one fewer replica than prescribed by Theorem 1. Let key k in a full tree-based routing DHT. r be the missing replica. We will show that there exists a query node q for which d disjoint routes are not created.Proof. Every set Ri;j is a unique set of BiÀ1 replicas equally N Suppose r 2 Ri;j , then let q be a node in ½r; r þ j Bi Þ. The spaced over the entire id space, which implies an replica r is the only replica in Ri;j that can create a interreplica spacing of BN . Consider a source node u iÀ1 disjoint route for the query node q. Therefore, step j in and one of its routing table parts P ¼ ½u; u þ BN Þ. For iÀ1 round i does not create a disjoint route and we cannot each Ri;j , there exists one replica rk 2 Ri;j such that form d disjoint routes with one fewer replica. u t rk 2 P . Furthermore, the replicas frk g will be equally It is worth noting that MAXDISJOINT provides these spaced within P , separated by an interreplica spacing of N properties without modifying the underlying routing Bi . Regardless of how u chooses to partition P for its mechanism. MAXDISJOINT naturally creates disjoint routes routing table, there exists a replica in each part of P . using the properties of the tree-based routing scheme. Therefore, there is a unique routing table entry for each replica rk 4 and a disjoint route is created in every step of 4.4 Chord as a Tree-Based Routing DHT the algorithm. Using the definitions of m and n in the To provide consistency with previous work , we algorithm, it is easy to show that d À 1 steps are reconsider Chord as a tree-based routing DHT. It is performed. The d À 1 disjoint routes created in the straightforward to show that Chord finger tables are algorithm are combined with the route to k to create a constructed like tree-based routing tables with B ¼ 2. There- total of d disjoint routes. u t fore, we can apply Theorem 1 to give the following corollary: 4. It is possible that one of the replicas is placed at u. Even though no Corollary 2. To produce d logB N disjoint routes from anyrouting is performed to fetch this replica, we count it as a route of zero hops. query node to a key k in a full tree-based routing DHT with
HARVESF AND BLOUGH: REPLICA PLACEMENT FOR ROUTE DIVERSITY IN TREE-BASED ROUTING DISTRIBUTED HASH TABLES 425 branching factor B ¼ 2, the key k must be replicated at 2dÀ1 equally spaced locations in the ring.Proof. When B ¼ 2, m ¼ d À 1 and n ¼ 0. Therefore, d disjoint routes are created by replicating k at 2dÀ1 locations in the ring. Furthermore, round i will place 2iÀ1 replicas equally spaced over the id space. Aggregating the replicas placed in each round will result in 2dÀ1 replicas equally spaced over the id space with interreplica spacing 2N . dÀ1 u t Note that the claim in Corollary 2 is consistent with thefindings in .4.5 Toleration of Runs Fig. 3. Two methods for increasing the replication degree when using equally spaced replica placement.We define a run of length l starting at peer m to be thecontiguous set of peers with identifiers in the interval½m; m þ lÞ. As indicated in , an adversary can create In a Pastry DHT with a typical value of B ¼ 16 andimbalance in the distribution of peers in the DHT by 16 replicas, an adversary may control a run of more thanappropriately selecting identifiers. In the worst case, an 85 percent of the identifier space and there will be a route toadversary can take control of a contiguous sequence of at least one replica for every possible query. As theidentifiers or, using our terminology, a run of peers. replication degree approaches the identifier space size, the We claim that MAXDISJOINT replication can tolerate maximum length run tolerable by the DHT approaches BÀ1adversarial runs of bounded length in tree-based routing B N.DHTs. Before proving the tolerable length of a run, we 4.6 Adaptivity and Flexibility of MaxDisjointprovide some intuition of how a run may be used to disrupt There is a noted similarity between the MAXDISJOINT androuting in the DHT. Consider a query node q. An adversary 1 equally spaced placements. In fact, when n ¼ 0, MAXDIS-can reduce the number of replicas reachable from q by B by N JOINT produces equally spaced replica locations. One maycontrolling the peer with identifier q þ B , where N is the argue that equally spaced replica placement is as effective asidentifier space size and prefix matching is performed in MAXDISJOINT in creating disjoint routes. However, replicabase B. This is because all of the replicas in the interval placements have other desirable properties other than their n 1½q þ N ; q þ 2 N Þ are routed through the node q þ B (B of all B B ability to create route diversity; equally spaced placementreplicas are in this interval). If the adversary controls a run fails to deliver some of these benefits when B > 2.of nodes ending at q þ BÀ1 N , he can control a larger number B B Two properties in particular are adaptivity and flexibility.of replicas. Adaptivity is the ability to easily change the replicationTheorem 3. A full tree-based routing DHT with an identifier degree of an object without incurring a large overhead. If the space of size N, and r ! B replicas placed using MAXDIS- replication degree of an object is changed, we would like to JOINT placement can tolerate any adversarial run of length minimize the number of messages exchanged and objects 1 þ NðBÀ1 À B1m Þ, where m ¼ blogB rc. B shifted. Flexibility is the ability to easily vary the replicationProof (By Induction). Since we are using MAXDISJOINT degree of different objects without incurring a large over- placement, it is straightforward to show that m ¼ head. Certainly, some objects may be more popular or critical dÀ1 blogB rc ¼ bBÀ1c. This implies that the maximum length than others and require a higher degree of replication. The run tolerable by the DHT changes with each round of the placement should replicate objects to varying degrees MAXDISJOINT algorithm. without using excessive time or state at the time of insertion Consider a peer q. For B r < B2 , m ¼ 1 and there and lookup. exists a replica k in the interval ½q; q þ N Þ. If we assume MAXDISJOINT provides adaptivity more effectively than B that the adversary has control of the peer q þ ðB À 1Þ N an equally spaced placement. A change in the replication B (which is the worst case), then we must ensure that k is degree must be handled carefully to maintain equal spacing. Consider an increase in the replication degree to not in the run. Thus, the maximum length run is add an additional disjoint route. With MAXDISJOINT, the ½q þ N ; q þ ðB À 1Þ N Þ, which has length ðB À 1Þ N À N þ B B B B additional replicas are placed at the locations determined 1 or 1 þ NðBÀ1 À BÞ. 1 B by performing another step in the placement algorithm Assume that the maximum length run tolerable with leaving the existing replicas in their current locations. r replicas is 1 þ NðBÀ1 À B1m Þ, where m ¼ blogB rc. Con- B With equally spaced replicas, additional replicas can be sider a query node q. If we assume the adversary takes introduced in two different ways. The first option is to control of the peer q þ ðB À 1Þ N , then he does not control B compute the equally spaced replica locations for the new N any peers in the interval ½q; q þ Bm Þ. Furthermore, there replication degree and shift existing replicas, if necessary. In must be at least one replica in this interval. If another some cases, the existing replica locations will not be a subset round of the MAXDISJOINT algorithm is performed, then of the new replica locations. This implies that existing there will be B replicas in this interval separated by a replicas will have to be shifted, which has a non-negligible spacing of BN . Thus, the length of the tolerable run mþ1 cost. The second option is to double the current replication increases by BN to 1 þ NðBÀ1 À Bmþ1 Þ. mþ1 B 1 u t degree such that no existing replicas must be shifted. These two options are depicted graphically in Fig. 3.
426 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011 Next, the DHT lookup primitive must be modified to accommodate the new replication scheme. When a peer is queried for a key, the query node must compute the locations of all replicas. The key identifier is used to route to the replicas. Once a replica’s home node is found, the key identifier pairs are compared to return the appro- priate replica. It is worth noting that no additional routing table entries are required to route to the replicas. In addition, if the query node dispatches the lookups for the entire replica set simultaneously, there may be an improvement in perfor- mance because the query node can return the first correct response received (which may have returned along a route shorter than the route to the master key). However, if the added load of the extra lookups puts strain on the system,Fig. 4. Replication degree for increasing numbers of disjoint routes the performance may improve only slightly or even degrade.(B ¼ 16). We evaluate this hypothesis through experimentation. Finally, when peers join or leave the DHT, the DHT join Doubling the replication degree avoids the cost of and leave mechanisms can be used by simply ignoring theshifting existing replicas, but may come with the added virtual peer identifiers in each key identifier pair. Note thatburden of storing an excessive number of replicas. The DHT replica placement schemes that place replicas in thenumber of replicas prescribed by Theorem 1 is sufficient. neighborhood of the master key require modification to theFor example, in Fig. 3, only four additional replicas are node join and leave mechanisms. To maintain the replica-needed to create an additional disjoint route; however, tion degree, replicas will need to be shifted for every joindoubling the replication degree introduces eight new or leave. Ghodsi et al.  discuss the effect of churn onreplicas. The excess storage burden created when doubling symmetric (equally spaced, MAXDISJOINT) replication andthe replication degree is shown quantitatively for B ¼ 16 in show that only Oð1Þ messages are needed to maintain theFig. 4. MAXDISJOINT is able to create a desired number of replication degree for every join or leave compared todisjoint routes more effectively than equally spaced place- ðrÞ messages for a successor-list (neighbor set) placement,ment when there is a change in the replication degree. where r is the replication degree. A sound replica placement also provides flexibility;that is, the replication degree of one data item is notdependent on the replication degree of any other data 5 EXPERIMENTSitem. Fortunately, flexibility can be provided easily by To confirm that our analytical results hold for sparselyMAXDISJOINT placement. populated DHTs or DHTs with clustered id spaces, we The problem of flexibility arises at the time of lookup. conducted a series of experiments through simulation. First,Without knowledge of the replication degree of the target, to measure the impact of replica placement on routingthe replica locations cannot be determined. As a solution, robustness, we consider the number of disjoint routesrather than fixing a system-wide replication degree for all created for several replica placements. Furthermore, wedata items or storing the replication degrees for all objects, find the probability of lookup success when nodes arewe fix a maximum replication degree rmax for all objects. For compromised at random or in a run of several nodes.a data item with replication degree r rmax , the r replicas Second, we consider a heuristic used for creating routewill be located at a subset of the rmax replica locations. As a diversity in  that we call neighbor set routing. We measureresult, for some lookups, we may query a replica that does its ability to create route diversity and the impact on thenot exist, but we trade off these extra messages for the probability of lookup success.reduction in node state. Finally, having shown that replica placement can4.7 Implementation improve routing robustness, we consider the impact of parallel queries on response time.To uniquely identify each replica, we use a key identifier pairðk; vÞ, where k is the key identifier and v is virtual key 1. Simulation Environment: All of our experiments wereidentifier. For each replica, v gives the location of the master performed using a Java-based simulator we devel-key. By definition, the master key k is denoted by the pair oped. We are able to model Chord and Pastryðk; kÞ. We require an ordered pair because two data items routing, uniform and clustered node distributions,may have replicas that reside on the same peer. and two adversarial models. Nodes may be com- When a key k is inserted into the DHT with replication promised at random with some failure probability ordegree d, we first compute the key identifier pairs for each in a run of several nodes. The simulator is extensiblereplica: ðk; kÞ; ðk1 ; kÞ; ðk2 ; kÞ; . . . ðkdÀ1 ; kÞ. Once the key iden- to model other DHT implementations, node dis-tifier pair for each replica is computed, we use the DHT key tributions, and adversarial models.insertion mechanism to insert the replicas. That is, we Each data point in our results is representative ofperform a lookup for each key identifier and store the over 100,000 lookups performed in 10 differentreplica in their respective locations. random node distributions. We simulate a lookup
HARVESF AND BLOUGH: REPLICA PLACEMENT FOR ROUTE DIVERSITY IN TREE-BASED ROUTING DISTRIBUTED HASH TABLES 427 TABLE 1 Pastry Simulation Parameters by randomly selecting an uncompromised query node and a target key. In reality, if the query node is compromised, it can affect the outcome of the lookup. However, if we assume that data items are self- Fig. 5. Number of disjoint routes with increasing replication degree in verifying, a compromised query node can only cause Pastry (N ¼ 228 ; n ¼ 8;192; B ¼ 16). the client to time out and select another query node. We deem a lookup successful if there exists a route set placement. In the extreme case, if the replica spacing is consisting of only uncompromised nodes from the much less than the average internode spacing, two or more query node to any replica of the target key. If all replicas may be placed at the same node. We observe this routes from the query node to the replica set contain a phenomenon for the spaced placement with s ¼ 105 in compromised node, then the lookup is deemed to fail. Fig. 5. As we increase the spacing, there is a tendency to For most experiments, it is sufficient to compute increase the number of disjoint routes. However as we routes in the network using the appropriate continue to increase the replication, we will reach a routing protocol. However, to measure response saturation point where replica locations wrap around the time, it was necessary to modify our simulator to identifier space and no additional disjoint routes will be be event driven. The remaining simulation para- created. We observe this phenomenon when the spacing meters are summarized in Table 1. s ¼ N=6. This implies that the spacing should be a function 2. Replica Placements Considered: In our experiments, we of the replication degree, which is fundamental to how consider four replica placements: MAXDISJOINT; MAXDISJOINT creates disjoint routes. neighbor set, where replicas are placed at distinct The random placement creates nearly as many disjoint nodes in the neighborhood of the root (e.g., Chord routes on average because replicas are uniformly distributed. successor list, Pastry leaf set); random, where replica However, there is a significant difference between MAXDIS- locations are uniformly distributed; and spaced, JOINT and random placement in the worst case. We present where replica identifiers are separated by a uniform spacing s. It is worth noting that two replicas may be an argument in support of this claim at the end of this section. placed at the same node with spaced replication. To ensure that this number of disjoint routes could be In the case of neighbor set placement, some created in more sparsely populated id spaces, we varied the implementations may attempt to reduce load by number of nodes in the DHT starting from 32 and measured querying the entire replica set with a single lookup the number of disjoint routes for various replication degrees. message. This naturally creates route overlap; for a These data are depicted in Fig. 6. The actual number of fair assessment, we dispatch a separate lookup for disjoint routes converges quickly to the theoretical value, each replica in the replica set. which implies that MAXDISJOINT placement can be used effectively in very sparsely populated networks. In these5.1 Measurement of Disjoint RoutesFirst, to verify the correctness of our analysis, we measurethe average number of disjoint routes created using theconsidered replica placements. These results are depictedin Fig. 5. For the parameters tested, MAXDISJOINTplacement outperforms the other placements in creatingdisjoint routes. The neighbor set placement does not create a significantnumber of disjoint routes as expected. Routes toward keysthat are close to each other in the identifier space are likelyto converge. Since the neighbor set placement clustersreplicas, an adversary can eliminate the entire replica set ifhe can compromise a node common to all routes destinedfor that neighborhood. By increasing the route diversity, weeliminate these single points of vulnerability. The performance of the spaced placement scheme isdependent on the spacing chosen. If the spacing is small, Fig. 6. Number of disjoint routes with increasing number of nodes inthen the spaced placement is very similar to the neighbor Pastry (N ¼ 228 ; B ¼ 16; d 2 f2; 3; 4; 5; 6g).
428 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011Fig. 7. Disjoint routes for tightly clustered nodes in Pastry Fig. 9. Percent decrease in disjoint routes from uniformly(N ¼ 228 ; n ¼ 8;192; ¼ 215 ). distributed nodes to a clustered distribution in Pastry (N ¼ 228 ; n ¼ 8;192; C ¼ 4; ¼ 1016 ).results, the theoretical number of disjoint routes is achievedfor loads greater than 1,000 nodes, which is less than a happen when creating a large number of disjoint routesthousandth of a percent of the identifier space. (which requires more replicas). We computed the percentage decrease in disjoint routes5.2 Resilience to Node Clustering that results from clustering. This analysis produced someTo model the population distribution that may result from very interesting results that are depicted in Fig. 9. Onethe collusion of several malicious nodes, a series of interesting conclusion is that not all replica placements areexperiments were run on clustered DHTs. To model a negatively affected by node clustering. In particular, whenclustered distribution, four node ids were randomly replicas are placed close together in the id space, e.g.,selected as cluster means such that the clusters are neighbor set or spaced (for small spacings) placement, there can actually be an increase in the number of disjoint routesnonoverlapping and unpopulated gaps exist in the id created. This is because a long string of closely packedspace. The cluster density was tuned using the standard replicas may span a gap between clusters, which pushesdeviation of the Gaussian distribution centered around each some of the replicas to another cluster. The clusteringcluster mean. The number of disjoint routes created for ¼ actually helps distribute the replicas better resulting in215 and ¼ 216 are shown in Figs. 7 and 8, respectively. more disjoint routes. Nevertheless, the 4-6 percent increase Clustering can marginally reduce the number of disjoint in the number of disjoint routes is insubstantial becauseroutes created for small replication degrees, but the impact these placements fail to create a sufficient number ofis more dramatic when creating a larger number of disjoint disjoint routes with uniformly distributed nodes.routes. The impact of clustering is intensified with tighter To the contrary, the MAXDISJOINT and random place-clusters (i.e., decreasing ) because replicas are not located ments are affected negatively by clustering. Placements thatat the positions that MAXDISJOINT prescribes. Although the distribute replicas in the id space may place a replica in thereplica ids are properly assigned, the replicas themselves unpopulated gaps between clusters. These replicas getare confined to the clusters. One or more of the replicas may pushed to a cluster and a disjoint route may be lost. Thislay in the unpopulated gaps between clusters. These phenomenon takes effect for MAXDISJOINT when r ¼ 8. Atreplicas get “pushed” to next cluster in the id space, this point, the interreplica spacing is 225 . If we use twopossibly eliminating a disjoint route. This is more likely to standard deviations to capture 95 percent of each cluster, we can estimate the intercluster gaps to about 226 , which is twice the interreplica spacing. That implies that about half of all replicas are located in the gaps between clusters. Note that the random placement suffers from this problem over the entire range of replication degrees because it is not guaranteed to distribute replicas across the entire id space. Furthermore, as we increase the replication degree, more replicas will be located in the gaps and we lose the benefit of increased replication degree. Nevertheless, the three to five percent decrease in the number of disjoint routes is relatively insignificant because MAXDISJOINT creates so many more disjoint routes than the other placements in a uniformly populated id space. 5.3 Impact of Replica Placement on Routing Robustness To demonstrate that the number of disjoint routes has aFig. 8. Disjoint routes for loosely clustered nodes in Pastry significant impact on the robustness of the DHT, we(N ¼ 228 ; n ¼ 8;192; ¼ 216 ). measure the probability of lookup success with a random
HARVESF AND BLOUGH: REPLICA PLACEMENT FOR ROUTE DIVERSITY IN TREE-BASED ROUTING DISTRIBUTED HASH TABLES 429Fig. 10. Probability of routing success with uniformly compromised Fig. 11. Probability of routing success with runs of compromised nodesnodes in Pastry (N ¼ 228 ; n ¼ 8;192; B ¼ 16; r ¼ 8). in Pastry (N ¼ 228 ; n ¼ 8;192; B ¼ 16; r ¼ 16).fraction of faulty nodes. A faulty node may be a failed or placement is expensive to implement in practice and thecompromised node. The probability of routing success is average performance of a random placement does notshown in Fig. 10. trickle down to the worst case. These results indicate a correlation between the number The worst case attack against a replica placement is toof disjoint routes and the probability of lookup success. target the replica locations themselves. One may argue that aThe MAXDISJOINT and random placements most effec- placement with random locations would make it moretively create disjoint routes and, thus, have the most difficult for an adversary to determine and target the replicapositive impact on the probability of routing success. set than a deterministic approach like MAXDISJOINT.Furthermore, we can conclude that using MAXDISJOINT However, a truly random placement also complicatesplacement instead of neighbor set placement can drama- matters for the query node. The query node must be abletically improve the probability of routing success. With a to compute the replica locations for any object at the time ofquarter of nodes compromised at random, greater than lookup. To avoid keeping a considerable amount of state at97 percent of all lookups can be resolved successfully with each node, the only practical proposal of which we are awareMAXDISJOINT placement compared to only 60 percent is to use multiple hash functions to generate “random”with neighbor set placement. replica locations. However, this implementation is not truly With the network configuration in Fig. 10, the MAXDIS-JOINT placement creates eight disjoint routes. Therefore, an random and, with knowledge of the hash functions, it is asadversary could prevent the correct resolution of a given vulnerable as MAXDISJOINT to attacks that target all replicaquery by compromising only eight nodes. However, with locations for a particular object. This makes “random”far more nodes than that compromised at random, nearly replication simply a different deterministic placement thatall queries are resolved successfully. This is a strong is, in a sense, an approximation of equally spaced replication.indication that replica placement plays a critical role in Yet, as we show next, the performance of random replicationproviding robustness in DHTs. falls considerably short of MAXDISJOINT. Our analysis indicates that MAXDISJOINT should be able The performance results we have shown thus far depictto tolerate runs of compromised nodes better than placements the average number of disjoint routes created. To considerthat cluster replicas closely together. Experimental results the worst case performance, we measured the minimumthat confirm this hypothesis are depicted in Fig. 11. With number of disjoint routes created for MAXDISJOINT and16 replicas and 85 percent of the identifier spaced compro- random placement. These results are depicted in Fig. 12.mised in a run, greater than 96 percent of lookups are resolved The results in Fig. 12 confirm our hypothesis of the worstsuccessfully with MAXDISJOINT placement compared to only case performance of a random placement. For a few objects,13 percent with neighbor set placement. Furthermore, these the placement may create as few as half of the desiredresults demonstrate an exploit of random placement. With number of disjoint routes. To establish that the worst casemoderately long runs, MAXDISJOINT is able to successfully was not an isolated case, we measured the number ofresolve a higher fraction of queries than the random disjoint routes created over all lookups. The cumulativeplacement. For example, with 85 percent of the identifier distribution of the number of disjoint routes with eightspaced compromised in a run, only 66 percent of lookups are replicas is shown in Fig. 13.resolved successfully with a random placement. This is For the case depicted in Fig. 13, MAXDISJOINT createdbecause the random placement may cluster replicas for a the expected eight disjoint routes for every single lookup.significant fraction of keys. We investigate this further in the Random placement created six or fewer disjoint routes forfollowing section. 45 percent of lookups and, in some cases, it created as few as four, or only half of the number produced by5.4 MaxDisjoint versus Random Placement MAXDISJOINT. We observed the same behavior in otherIn the experimental data presented thus far, random cases as well. Therefore, if delivering a consistent level ofplacement seems to have performed on par with MAXDIS- fault tolerance across all lookups is a design constraint,JOINT on average. We argue, however, that a truly random random placement is not a reasonable solution.
430 IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, VOL. 8, NO. 3, MAY/JUNE 2011 Fig. 14. Graphical depiction of neighbor set routing. 1. neither replication nor neighbor set routing,Fig. 12. Worst-case performance of MAXDISJOINT and random replica 2. only neighbor set routing through eight neighbors,placement in Pastry. The minimum number of disjoint routes created 3. only MAXDISJOINT placement with eight replicas,over all lookups is shown (N ¼ 220 ; n ¼ 8;192). and 4. both neighbor set routing and MAX DISJOINT5.5 Replica Placement and Neighbor Set Routing placement.We believe replica placement is an efficient way of creating These results are depicted in Fig. 15.disjoint routes because it does not require significant Both MAXDISJOINT placement and neighbor set routingmodification to the underlying DHT routing scheme. Other can have a positive impact on the probability of lookupworks have considered reusing the existing routing success. However, the trend is stronger with replicainformation to create route diversity , . Although placement. With a quarter of nodes compromised atthese approaches do not create provably disjoint routes, we random, over 97 percent of lookups are resolved success-believe there is value in introducing some additional form fully with MAXDISJOINT placement compared to 63 percentof route diversity. Furthermore, we believe that these with neighbor set routing alone. At best, neighbor settechniques could be combined with our replica placement routing can create independent routes, since all paths willto provide additional benefit. To evaluate this claim, we converge at the destination. If the destination node isconsider the route diversity technique introduced by Castro compromised, no amount of route diversity can increase the probability of lookup success.et al. , which we call neighbor set routing. Nonetheless, the added route diversity that neighbor set Castro et al. use neighbor set routing to find diverse routing provides can benefit MAXDISJOINT placement,routes toward the neighborhood of a key. To create diverse especially with a large fraction of compromised nodes.routes, messages are routed via the neighbors of the source With 50 percent of nodes compromised, the probability ofnode. This is depicted graphically in Fig. 14. Castro et al. lookup success of using MAXDISJOINT placement improvesclaim that this technique is sufficient in the case when from 52 to 84 percent with neighbor set routing.replicas are distributed uniformly over the identifier space,as in CAN and Tapestry. We consider the ability of 5.6 Parallel Queries and Response Timeneighbor set routing to create diverse routes to a replica Finally, since replica placement seems to be a reasonableto enhance the routing robustness of MAXDISJOINT. method for improving routing robustness, it is natural to To evaluate the relative impact of replica placement andneighbor set routing, we consider four scenarios: Fig. 15. Probability of routing success with neither replication nor neighbor set routing (None), neighbor set routing only (NBR), MAXDISJOINT placement only (MD), and both neighbor set routingFig. 13. Cumulative distribution of disjoint routes created for MAXDIS- and MAXDISJOINT placement together (NBR+MD) (N ¼ 228 , n ¼8;192,JOINT and random placement in Pastry (N ¼ 220 ; n ¼ 8;192; r ¼ 8). B ¼ 16, r ¼ 8).
HARVESF AND BLOUGH: REPLICA PLACEMENT FOR ROUTE DIVERSITY IN TREE-BASED ROUTING DISTRIBUTED HASH TABLES 431consider some practical concerns of using replication. Whenquerying a replica set, response time may be reduced byquerying the entire replica set in parallel. Alternatively, thiscould have a significant impact on the system load, whichmay result in congestion and increased response time.Therefore, we consider three replica query strategies:parallel, sequential, and hybrid. Unlike its counterpart, the sequential strategy queriesreplicas one at a time waiting for a response betweenreplicas. This strategy controls system load at the expense ofresponse time. With very few failures, the sequentialstrategy should result in reasonable response times withoutinducing a significant load on the network. However, theresponse time may not be as resilient to an increase infailure rate as the parallel strategy. We also consider a hybrid approach in which replicas arequeried in sets of two or more replicas. Sets are queried oneat a time waiting for a response before querying the next set.With this strategy, the trade-off can be managed using theset size to tune the response time with load. In figures, thehybrid strategy series are labeled “Hybrid-S,” where Sdenotes the set size. To present realistic response times, we model theinternode delay with a log-normal distribution with ¼60 ms and ¼ 50 ms and total response time as the sum ofinternode delays along a route. The log-normal distributionparameters were selected using results from a study of TCPconnection round trip times . We extend our fault model to assume that failed nodescorrectly forward lookups to create added system load, butreturn incorrect responses that the query node is able todetect. Therefore, a failed route will result in the samesystem load as a successful route, but will add to the overallresponse time of the lookup. In a real system where a failuremay result in no response at all, it may be necessary to use atime-out for the sequential and hybrid schemes. To measure the effect of message queuing, we measureresponse time for lookup rates varying from 1 Â 102 to1 Â 107 lookups per second. Since the underlying physicaltopology is difficult to predict and we are more concernedwith the queuing that results from our query strategy, wemodeled queuing in the overlay, rather than in thephysical network. We assume that each node in theoverlay is a leaf node in the underlying physical topologyand has a 1 megabit per second link to its gateway router.Furthermore, we assume that the message size is 1 kilobyte,which is consistent with real Pastry implementations. Fig. 16. Response time (ms) for successful lookups with increasing The average response times for the discussed replica lookup rate with f ¼ (a) 5 percent, (b) 10 percent, and (c) 15 percentquery strategies are depicted in Fig. 16. We repeated the (N ¼ 228 ; n ¼ 8;192; B ¼ 16; r ¼ 16).experiment for f equal to 5, 10, and 15 percent; these resultsare shown in Figs. 16a, 16b, and 16c, respectively. parallelization results in message queuing and delayed The trend across these graphs confirms our intuition that responses. This effect may be stronger in small networks,the response time of the sequential strategy increases with which have reduced capacity, and in networks with a higherthe fraction of nodes compromised in the network. replication degree.Furthermore, the response time does not seem to vary The hybrid strategies offer a suitable trade-off betweensignificantly with changes in the lookup rate because the the parallel and sequential strategy. The set size can beeffects of an increased lookup rate are not compounded by increased to reduce response times and improve resiliencethe parallelization of replica queries. To the contrary, the parallel strategy is not resilient to to changes in the fraction of compromised nodes. Tochanges in the lookup rate. As the lookup rate increases improve the resilience to changes in lookup rate, the setbeyond 10,000 lookups per second, the response time of the size should be decreased. The value of this parameterparallel strategy increases dramatically. With a relatively should be determined by the needs of the application. Forlow fraction of nodes compromised (Fig. 16a), the sequential example, caller lookup rates in a voice over IP (VoIP)strategy can even outperform the parallel strategy in terms application may change wildly with the time of day and aof response time. The additional load resulting from smaller set size should be used to control congestion.